Descriptive Statistics - Numerical Measures

Data Science for
Managerial Decisions
Concepts covered in the last two sessions?
2
Descriptive Statistics - Numerical Measures
◼ Measures of Location
◼ Measures of Variability
3
Measures of Location
Mean
If the measures are computed
Median for data from a sample,
Mode they are called sample statistics.
Percentiles
Quartiles If the measures are computed
for data from a population,
they are called population parameters.
A sample statistic is referred to

as the point estimator of the
corresponding population parameter.
4
Mean
◼ Perhaps the most important measure of location is the mean.
◼ The mean provides a measure of central location.
◼ The mean of a data set is the average of all the data values.
◼ The sample mean 𝑥ҧ is the point estimator of the population mean 𝜇.
Sample Mean 𝑥ҧ
Sum of the values

of the n observations
x i
x=
n
Number of observations
in the sample
6
Population Mean 𝜇
Sum of the values

of the N observations
x i
=
N
Number of observations
in the population
7
Sample Mean
Example: Apartment Rents
Seventy apartments were randomly sampled in a small college town. The monthly rent prices for these
apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
Sample Mean
Seventy apartments were randomly sampled in a small college town. The monthly rent prices for these
apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
x=
 x i
=
34, 356
= 490.80
n 70
Mean
◼ Imagine ten folks are sitting on bar stools in a Seattle
◼ Each of them earns $35,000 a year, which makes the mean annual
income for the group as ………….
◼ Now Jack Dorsey walks into the bar; assume that Dorsey has annual
income of $1 billion
◼ Now the mean annual income of the bar patrons rises to ……….
10
Mean
◼ Imagine ten folks are sitting on bar stools in a Seattle
◼ Each of them earns $35,000 a year, which makes the mean annual
income for the group as $35,000.
◼ Now Jack Dorsey walks into the bar; assume that Dorsey has annual
income of $1 billion
◼ Now the mean annual income of the bar patrons rises to $90,940,909.
11
Mean
Income Freq Total

$35,000 10 $350,000
$1,000,000,000 1 $1,000,000,000
Avg $90,940,909
12
Limitations of using Mean
◼ If I were to describe the patrons of this bar as having an average annual

income of $91 million, the statement would be statistically correct yet
grossly misleading
◼ This isn’t a bar where multimillionaires hang out; it’s a bar where a bunch
of folks with relatively low incomes happen to be sitting next to Dorsey
◼ Because there has been explosive growth in incomes at the top end of the
distribution, the average income gets heavily skewed by the ultra rich
13
Quiz Time
◼ Calculation of mean is affected by __________ values
◼ If there are very high or very low values in that data that don’t look
like most of the data, then the mean is _______ ____________
◼ Fortunately, there are measures that do not suffer from this

shortcoming
14
Quiz Time
◼ Calculation of mean is affected by extreme values
◼ If there are very high or very low values in that data that don’t look
like most of the data, then the mean is not representative
◼ Fortunately, there are measures that do not suffer from this

shortcoming
15
Median
◼ A few extremely large values can inflate the mean
◼ Recall the previous example
◼ The median of a data set is the value in the middle when the data items are arranged in
ascending order
◼ Whenever a data set has extreme values, the median is the preferred measure of central
location
◼ The median is a robust estimator of central location but says nothing about how the
data on either side of its value is spread or dispersed
Median
For an odd number of observations:
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
the median is the middle value.
Median = 19
Median
For an even number of observations:
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
the median is the average of the middle two values.
Median = (19 + 26)/2 = 22.5

Median
◼ Example: Apartment Rents
Averaging the 35th and 36th data values:

Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Mode
◼ The mode of a data set is the value that occurs with greatest frequency.
◼ The greatest frequency can occur at two or more different values.
◼ If the data have exactly two modes, the data are bimodal.
◼ If the data have more than two modes, the data are multimodal.
Exercise – Compute Mean , Median and Mode
21
Exercise – Compute Mean , Median and Mode
Sum 80 Sum 98
Count 16 Count 16
Mean 5 Mean 6.13
Median 5 Median 7
Mode 5 Mode 8
Sum 80
Count 16
Mean 5
Median 5
Mode 2,8
22
Percentiles
◼ A percentile provides information about how the data are spread over the interval
from the smallest value to the largest value.
◼ Admission test scores for colleges and universities are frequently reported in terms of
percentiles.
◼ The pth percentile of a data set is a value such that at least p percent of the items take
on this value or less and at least (100 - p) percent of the items take on this value or
more.
Percentiles
Arrange the data in ascending order.
Compute index i, the position of the pth percentile.

i = (p/100)n
If i is not an integer, round up. The pth percentile

is the value in the ith position.
If i is an integer, the pth percentile is the average

of the values in positions i and i+1.
80th Percentile
i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
80th Percentile
“At least 80% of the “At least 20% of the

items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Quartiles
◼ Quartiles are specific percentiles.

◼ First Quartile = 25th Percentile
◼ Second Quartile = 50th Percentile = Median
◼ Third Quartile = 75th Percentile
Third Quartile
Third quartile = 75th percentile

i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Measures of Variability
Package of Package of
company 1 company 2
25 17
20 16
15 15
10 14
5 13
Avg: 15 Avg: 15
Measures of Variability
◼ Range
◼ Interquartile Range
◼ Variance
◼ Standard Deviation
◼ Coefficient of Variation
Range
◼ The range of a data set is the difference between the largest and smallest data values.
◼ It is the simplest measure of variability.
◼ It is very sensitive to the smallest and largest data values.
Range
Range = largest value - smallest value

Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Interquartile Range
◼ The interquartile range of a data set is the difference between the third quartile and
the first quartile.
◼ It is the range for the middle 50% of the data.
◼ It overcomes the sensitivity to extreme data values.
Interquartile Range
3rd Quartile (Q3) = 525

1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Variance
◼ The variance is a measure of variability that utilizes all the data.
◼ It is based on the difference between the value of each observation (xi) and the
mean (𝑥ҧ for a sample,  for a population).
◼ The variance is useful in comparing the variability of two or more variables.
Variance - Example
Monthly Salary Sample Mean Deviation about Squared

(in US$) the Mean Deviation About
the Mean
3850 3940 -90 8100
3950 3940 +10 100
4050 3940 +110 12100
3880 3940 -60 3600
3970 3940 +30 900
◼ Sum of deviations about the mean ?

◼ Sum of squared deviations about the mean ?
Variance
The variance is computed as follows:
2  ( xi − x )
2
 ( xi −  )
2
s =  =
2
n −1 N
for a for a
sample population
Variance
The variance is computed as follows:
2  ( xi − x )
2
 ( xi −  )
2
s =  =
2
n −1 N
for a for a
sample population
◼ In most statistical applications, the data being analyzed for a sample.

◼ When we compute a sample variance, we are often interested in using it to estimate the population
variance.
◼ If the sum of the squared deviations about the sample mean is divided by (n-1), and not n, the resulting
sample variance provides an unbiased estimate of the population variance.
Standard Deviation
◼ The standard deviation of a data set is the positive square root of the variance.
◼ It is measured in the same units as the data, making it more easily interpreted than
the variance.
The standard deviation is computed as follows:
s = s2  = 2
for a for a
sample population
Coefficient of Variation
◼ The coefficient of variation indicates how large the standard deviation is relative
to the mean.
The coefficient of variation is computed as follows:

s   
 100  %  100  %
x   
for a for a
sample population
Coefficient of Variation (CoV)
◼ Stock A ◼ Stock B
◼ Five weeks of average ◼ Five weeks of average prices of

prices of stock A are 57, 68, stock B are 12, 17, 8, 15 and 13
64, 71 and 62
◼ Mean ? ◼ Mean ?
◼ Standard Deviation? ◼ Standard Deviation?
◼ CoV? ◼ CoV?
With standard deviation as a measure of risk, Stock ___ is more risky

With CoV as a measure of risk, Stock ____ is more risky
41
Coefficient of Variation (CoV)
◼ Stock A ◼ Stock B
◼ Five weeks of average ◼ Five weeks of average prices of

prices of stock A are 57, 68, stock B are 12, 17, 8, 15 and 13
64, 71 and 62
◼ Mean = 64.4 ◼ Mean =13
◼ Standard Deviation = 4.84 ◼ Standard Deviation = 3.03
◼ CoV = 7.5% ◼ CoV = 23.3%
With standard deviation as a measure of risk, Stock A is more risky

With CoV as a measure of risk, Stock B is more risky
42
Sample Variance, Standard Deviation,
And Coefficient of Variation
◼ Example: Apartment Rents
Variance
s 2
=
 (x i − x )2
= 2, 996.16
n−1
Standard Deviation
the standard
deviation is
s = s 2 = 2996.16 = 54.74
about 11%
of the mean
Coefficient of Variation
s   54.74 
  100  % =   100  % = 11.15%
x   490.80 
Summary of Last Session
✓ Mean
✓ Median
✓ Mode
✓ Percentiles
✓ Quartiles
✓ Range
✓ Interquartile Range
✓ Variance
✓ Standard Deviation
✓ Coefficient of Variation
44
Distribution Shape: Skewness
◼ An important measure of the shape of a distribution is called skewness.

◼ The formula for the skewness of sample data is
 xi − x 
3
n
Skewness =  
(n − 1)(n − 2)  s 

◼ Skewness can be easily computed using statistical software.

Symmetric (not skewed)

◼ Skewness is zero.
◼ Mean and median are equal.
Skewness = 0
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Moderately Skewed Left

◼ Skewness is negative.
◼ Mean will usually be less than the median.
Skewness = − .31
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Moderately Skewed Right
◼ Skewness is positive.
◼ Mean will usually be more than the median.
Skewness = .31
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Highly Skewed Right
◼ Skewness is positive (often above 1.0).
◼ Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Seventy efficiency apartments were randomly sampled in a college town. The
monthly rent prices for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
.35 Skewness = .92

.30
Relative Frequency
.25
.20
.15
.10
.05
0
z-Scores
◼ The z-score is often called the standardized value.
◼ It denotes the number of standard deviations a data value xi is away from the mean.
xi − x
zi =
s
𝑥𝑖 = 𝑥ҧ + 𝑠𝑧𝑖
z-Scores
◼ An observation’s z-score is a measure of the relative location of the observation
in a data set.
◼ A data value less than the sample mean will have a z-score less than zero.
◼ A data value greater than the sample mean will have a z-score greater than zero.
◼ A data value equal to the sample mean will have a z-score of zero.
z-Scores
◼ z-Score of Smallest Value (425)
xi − x 425 − 490.80
z= = = − 1.20
s 54.74
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Detecting Outliers
◼ An outlier is an unusually small or unusually large value in a data set.
◼ A data value with a z-score less than -3 or greater than +3 might be considered
an outlier.
◼ It might be:
➢ an incorrectly recorded data value
➢ a correctly recorded data value that belongs to the data set
Detecting Outliers
◼ The most extreme z-scores are -1.20 and 2.27
◼ Using |z| > 3 as the criterion for an outlier, there are no outliers in this data set.
Standardized Values for Apartment Rents

-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Detecting Outliers
57
Five-Number Summaries
and Box Plots
Summary statistics and easy-to-draw graphs can be

used to quickly summarize large quantities of data.
Two tools that accomplish this are five-number

summaries and box plots.
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value
Lowest Value = 425 First Quartile = 445

Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Box Plot
A box plot is a graphical summary of data that is based on a five-number summary.
A key to the development of a box plot is the computation of the median and
the quartiles Q1 and Q3.
Box plots provide another way to identify outliers.

Box Plot
◼ A box is drawn with its ends located at the first and third quartiles.
◼ A vertical line is drawn in the box at the location of the median (second quartile).
400 425 450 475 500 525 550 575 600 625
Q1 = 445 Q3 = 525
Q2 = 475
Box Plot
◼ Limits are located (not drawn) using the interquartile range (IQR).
◼ Data outside these limits are considered outliers.
◼ The locations of each outlier is shown with the symbol * .
Box Plot
The lower limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
The upper limit is located 1.5(IQR) above Q3.
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645
There are no outliers (values less than 325 or greater than 645)
in the apartment rent data.
Box Plot
◼ Whiskers (dashed lines) are drawn from the ends of the box to the
smallest and largest data values inside the limits.
400 425 450 475 500 525 550 575 600 625
Smallest value Largest value

inside limits = 425 inside limits = 615
To draw box plot in Excel:
https://support.microsoft.com/en-us/office/create-a-box-plot-10204530-8cdf-40fe-a711-2eb9785e510f
66
Chebyshev’s Theorem
At least (1 - 1/z2) of the items in any data set will be within z standard deviations of
the mean, where z is any value greater than 1.
Chebyshev’s theorem requires z > 1, but z need not be an integer.

At least ….. of the data values must be

within z = 2 standard deviations of the mean.


At least 75% of the data values must be



Let z = 1.5 with x = 490.80 and s = 54.74
At least (1 − 1/(1.5)2) = 1 − 0.44 = 0.56 or 56%

of the rent values must be between
x - z(s) = 490.80 − 1.5(54.74) = 409
and
x + z(s) = 490.80 + 1.5(54.74) = 573
(Actually, 86% of the rent values are between 409 and 573.)
Empirical Rule
When the data are believed to approximate a bell-shaped distribution …
The empirical rule can be used to determine the percentage of data values that must be within a
specified number of standard deviations of the mean.
The empirical rule is based on the normal distribution, which will be covered later
Empirical Rule
For data having a bell-shaped distribution:

68.26% of the values of a normal random variable
are within +/- 1 standard deviation of its mean.

are within +/- 2 standard deviations of its mean.

are within +/- 3 standard deviations of its mean.
Empirical Rule
99.72%
95.44%
68.26%

x
 – 3  – 1  + 1  + 3
 – 2  + 2
Measures of Association
Between Two Variables
Thus far we have examined numerical methods used to summarize the data for
one variable at a time.
Often a manager or decision maker is interested in the relationship between two

variables.
Two descriptive measures of the relationship between two variables are

covariance and correlation coefficient.
Example: Stereo & Sound Equipment Store
Store’s manager wants to determine relationship between the number of weekend

television commercials shown and the sales at the store during the following week.
75
76
Covariance
The covariance is a measure of the linear association between two variables.
Positive values indicate a positive relationship.
Negative values indicate a negative relationship.

Covariance
The covariance is computed as follows:
 ( xi − x )( yi − y ) for
sxy =
n −1 samples
 ( xi −  x )( yi −  y ) for
 xy = populations
N
Covariance
Store’s manager wants to determine relationship between the number of weekend

television commercials shown and the sales at the store during the following week.
79
Covariance
80
Covariance
➢ Scatter diagram with a vertical dashed line at 𝑥ҧ = 3

➢ Scatter diagram with a horizontal dashed line at 𝑦ത = 51
➢ The lines divide the scatter diagram into four quadrants
81
Covariance
➢ Points in quadrant I correspond to xi greater than 𝑥ҧ and yi greater than 𝑦,

ത points in quadrant II correspond to xi less
than 𝑥ҧ and yi greater than 𝑦,
ത and so on.
➢ Thus, the value of (xi -𝑥)(y
ҧ i -𝑦)
ത must be positive for points in quadrant I, negative for points in quadrant II, positive
for points in quadrant III, and negative for points in quadrant IV.
82
Covariance
➢ If the value of Sxy is +, the points with the greatest influence on Sxy must be in Q1 and Q3. Hence a positive value
for Sxy indicates a positive linear association between x and y.
➢ If the value of Sxy is -, the points with the greatest influence on Sxy must be in Q2 and Q4. Hence a negative value
for Sxy indicates a negative linear association between x and y.
83
Correlation Coefficient
Correlation is a measure of linear association and not necessarily causation.
Just because two variables are highly correlated, it does not mean that one variable
is the cause of the other.
◼ Correlation is NOT causation
85
The correlation coefficient is computed as follows:

sxy  xy
rxy =  xy =
sx s y  x y
for for
samples populations
The correlation coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear relationship.
Values near +1 indicate a strong positive linear relationship.
The closer the correlation is to zero, the weaker the relationship.

Covariance and Correlation Coefficient
Sample Covariance
sxy =
 ( x − x )( y
i i − y)
=
99
= 11
n−1 9
Sample Correlation Coefficient
sxy 11
rxy = = = 0.93
sx sy (1.49)(7.93)
Correlation Coefficient ?
89
Correlation Coefficient ?
➢ Correlation coefficient measures only the strength of linear relationship between two variables.
➢ It is possible for the correlation coefficient to be near zero, suggesting no linear relationship, when the
relationship between the two variables is non-linear.
90
Thank You !!!

Descriptive Statistics - Numerical Measures

Uploaded by

Copyright:

Available Formats

You might also like

Descriptive Statistics - Numerical Measures

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Descriptive Statistics - Numerical Measures

Uploaded by

Copyright:

Available Formats

Data Science for

A sample statistic is referred to

Sum of the values

Sum of the values

◼ Imagine ten folks are sitting on bar stools in a Seattle

◼ Imagine ten folks are sitting on bar stools in a Seattle

Income Freq Total

◼ If I were to describe the patrons of this bar as having an average annual

◼ Calculation of mean is affected by __________ values

◼ Fortunately, there are measures that do not suffer from this

◼ Calculation of mean is affected by extreme values

◼ Fortunately, there are measures that do not suffer from this

For an odd number of observations:

the median is the middle value.

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

Averaging the 35th and 36th data values:

Compute index i, the position of the pth percentile.

If i is not an integer, round up. The pth percentile

If i is an integer, the pth percentile is the average

“At least 80% of the “At least 20% of the

◼ Quartiles are specific percentiles.

Third quartile = 75th percentile

Range = largest value - smallest value

3rd Quartile (Q3) = 525

Monthly Salary Sample Mean Deviation about Squared

◼ Sum of deviations about the mean ?

The variance is computed as follows:

The variance is computed as follows:

◼ In most statistical applications, the data being analyzed for a sample.

The standard deviation is computed as follows:

The coefficient of variation is computed as follows:

◼ Five weeks of average ◼ Five weeks of average prices of

With standard deviation as a measure of risk, Stock ___ is more risky

◼ Five weeks of average ◼ Five weeks of average prices of

With standard deviation as a measure of risk, Stock A is more risky

◼ An important measure of the shape of a distribution is called skewness.

◼ Skewness can be easily computed using statistical software.

Symmetric (not skewed)

Moderately Skewed Left

.35 Skewness = .92

Standardized Values for Apartment Rents

Standardized Values for Apartment Rents

Summary statistics and easy-to-draw graphs can be

Two tools that accomplish this are five-number

Lowest Value = 425 First Quartile = 445

A box plot is a graphical summary of data that is based on a five-number summary.

Box plots provide another way to identify outliers.

The lower limit is located 1.5(IQR) below Q1.

Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325

The upper limit is located 1.5(IQR) above Q3.

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

Smallest value Largest value

Chebyshev’s theorem requires z > 1, but z need not be an integer.

At least ….. of the data values must be

At least ….. of the data values must be

At least ….. of the data values must be

At least 75% of the data values must be