Chapter 3 Updated

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 67

Statistics for Business and

Economics

Chapter3
Numerical Measures

© 2011 Pearson Education, Inc


Contents

1. Describing Qualitative Data


2. Graphical Methods for Describing
Quantitative Data
3. Summation Notation
4. Numerical Measures of Central Tendency
5. Numerical Measures of Variability
6. Interpreting the Standard Deviation
© 2011 Pearson Education, Inc
Contents

7. Numerical Measures of Relative Standing


8. Methods for Detecting Outliers: Box Plots
and z-scores
9. Graphing Bivariate Relationships
10. The Time Series Plot
11. Distorting the Truth with Descriptive
Techniques

© 2011 Pearson Education, Inc


Learning Objectives

1. Describe data using graphs


2. Describe data using numerical measures

© 2011 Pearson Education, Inc


Summation Notation

© 2011 Pearson Education, Inc


Summation Notation
Most formulas we use require a summation of numbers.

x i
i1

Sum the measurements on the variable that appears to the


right of the summation symbol, beginning with the 1st
measurement and ending with the nth measurement.

© 2011 Pearson Education, Inc


Summation Notation
For the data x1  5, x2  3, x3  8, x4  5, x5  4

i 1 2 3 4 5
x 2
 x 2
 x 2
 x 2
 x 2
 x 2

i1

5 3 8 5 4
2 2 2 2 2

 25  9  64  25  16  139

© 2011 Pearson Education, Inc


Numerical Measures
of Central Tendency

© 2011 Pearson Education, Inc


Thinking Challenge

$400,000

$70,000

$50,000 ... employees cite low pay --


most workers earn only
$30,000 $20,000.
... President claims average
$20,000 pay is $70,000!
© 2011 Pearson Education, Inc
Two Characteristics
The central tendency of the set of
measurements–that is, the tendency of the data to
cluster, or center, about certain numerical values.

Central Tendency
(Location)

© 2011 Pearson Education, Inc


Two Characteristics
The variability of the set of measurements–that
is, the spread of the data.

Variation
(Dispersion)

© 2011 Pearson Education, Inc


Standard Notation
Measure Sample Population
Mean X 

Size n N

© 2011 Pearson Education, Inc


Mean
1. Most common measure of central tendency
2. Acts as ‘balance point’
3. Affected by extreme values (‘outliers’)
4. Denoted x where
n
x i x 1  x 2 … x n
i 1
x  
n n
© 2011 Pearson Education, Inc
Mean Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
n

x i x1x2 x 3 x 4 x 5 x6
i 1
x  
n 6
10 .3  4.9  8.9  11.7  6.3  7.7

6
 8.30
© 2011 Pearson Education, Inc
Median
1. Measure of central tendency
2. Middle value in ordered sequence
• If n is odd, middle value of sequence
• If n is even, average of 2 middle values
3. Position of median in sequence
n 1
Positioning Point 
2
4. Not affected by extreme values
© 2011 Pearson Education, Inc
Median Example
Odd-Sized Sample
• Raw Data: 24.1 22.6 21.5 23.7 22.6
• Ordered: 21.5 22.6 22.6 23.7 24.1
• Position: 1 2 3 4 5

n 1 5 1
Positioning Point    3.0
2 2
Median  22 .6
© 2011 Pearson Education, Inc
Median Example
Even-Sized Sample
• Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
• Position: 1 2 3 4 5 6

n 1 6 1
Positioning Point    3.5
2 2
7.7  8.9
Median   8.30
2
© 2011 Pearson Education, Inc
Mode
1. Measure of central tendency
2. Value that occurs most often
3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data

© 2011 Pearson Education, Inc


Mode Example
• No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
• One Mode
Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9
• More Than 1 Mode
Raw Data: 21 28 28 41 43 43

© 2011 Pearson Education, Inc


Thinking Challenge
You’re a financial analyst
for Prudential-Bache
Securities. You have
collected the following
closing stock prices of new
stock issues: 17, 16, 21, 18,
13, 16, 12, 11.
Describe the stock prices
in terms of central
tendency.
© 2011 Pearson Education, Inc
Central Tendency Solution*
Mean
n

x i x 1  x 2 … x 8
x  i 1

n 8
17  16  21  18  13  16  12  11

8
 15 .5
© 2011 Pearson Education, Inc
Central Tendency Solution*
Median
• Raw Data: 17 16 21 18 13 16 12 11
• Ordered: 11 12 13 16 16 17 18 21
• Position: 1 2 3 4 5 6 7 8
n 1 8 1
Positioning Point    4.5
2 2
16  16
Median   16
2
© 2011 Pearson Education, Inc
Central Tendency Solution*

Mode
Raw Data: 17 16 21 18 13 16 12 11

Mode = 16

© 2011 Pearson Education, Inc


Summary of
Central Tendency Measures
Measure Formula Description
Mean x i / n Balance Point
Median (n+1) Middle Value
Position
2 When Ordered
Mode none Most Frequent

© 2011 Pearson Education, Inc


Shape
1. Describes how data are distributed
2. Measures of Shape
• Skew = Symmetry

Left-Skewed Symmetric Right-Skewed


Mean Median Mean = Median Median Mean

© 2011 Pearson Education, Inc https://www.youtube.com/watch?v=XSSRrVMOqlQ


AM vs. GM

https://www.investopedia.
com/ask/answers/06/geom
etricmean.asp

© 2011 Pearson Education, Inc


2.5

Numerical Measures
of Variability

© 2011 Pearson Education, Inc


Range
1. Measure of dispersion
2. Difference between largest & smallest
observations
Range = xlargest – xsmallest
3. Ignores how data are distributed

7 8 9 10 7 8 9 10
Range = 10 – 7 = 3 Range = 10 – 7 = 3
© 2011 Pearson Education, Inc
Variance &
Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4. Show variation about mean (x or μ)

x = 8.3

4 6 8 10 12
© 2011 Pearson Education, Inc
Standard Notation
Measure Sample Population
Mean x 
Standard
Deviation
s 
2 2
Variance s 
Size n N
© 2011 Pearson Education, Inc
Sample Variance Formula
n

 x  x
2
i
s2  i1
n 1


x1  x   x2  x 
2 2
 L  xn  x 
2

 n 1

n – 1 in denominator!

© 2011 Pearson Education, Inc


Sample Standard Deviation
Formula

s  s2
n

 x  x
2
i
 i1
n 1


x1  x   x2  x 
2 2
 L  xn  x 
2

 n 1

© 2011 Pearson Education, Inc


Variance Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7

n n

 (x i  x )
2

2
x i
i 1 i 1
s  where x   8.3
n 1 n

(
10 .3  8.3 )  (4.9  8.3 )  …  (7.7  8.3 )
2 2 2
2
s 
6 1
 6.368
© 2011 Pearson Education, Inc
Thinking Challenge
• You’re a financial analyst
for Prudential-Bache
Securities. You have
collected the following
closing stock prices of
new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
• What are the variance
and standard deviation
of the stock prices?

© 2011 Pearson Education, Inc


Variation Solution*
Sample Variance
Raw Data: 17 16 21 18 13 16 12 11

n n

 (x i  x )
2
x i
2 i 1 i 1
s  where x   15 .5
n 1 n
(
17  15 .5 )  (16  15 .5 )  …  (11  15 .5 )
2 2 2
2
s 
8 1
 11.14
© 2011 Pearson Education, Inc
Variation Solution*

Sample Standard Deviation

 x  x
2
i
s  s2  i1
 11.14  3.34
n 1

© 2011 Pearson Education, Inc


Summary of
Variation Measures
Measure Formula Description
Range X largest – X smallest Total Spread
Standard Deviation n Dispersion about
 x  x 
2
(Sample) i Sample Mean
i1
n 1
n

 x  µ  Dispersion about
2
Standard Deviation
i x
(Population) i1 Population Mean
N
n

 x  x 
2
Variance i Squared Dispersion
(Sample) i1 about Sample Mean
n 1
© 2011 Pearson Education, Inc
•Take three numbers: 1, 2 and 3.
•Mean value is 2
•Differences between values and a mean are:
•1-2 = -1
•2-2 = 0
•3-2 = 1
•Sum of these differences is
•-1 + 0 + 1 = 0
•Zero-sum property states that no matter what
numbers you start with, a result (sum of differences
between them and their mean) would be 0
© 2011 Pearson Education, Inc
2.6

Interpreting the
Standard Deviation

© 2011 Pearson Education, Inc


Interpreting Standard Deviation:
Empirical Rule
• Applies to data sets that are mound shaped and
symmetric
• Approximately 68% of the measurements lie
x  s to x  s
in the interval
• Approximately 95% of the measurements lie
in the intervalx  2s to x  2s
• Approximately 99.7% of the measurements lie
in the interval x  3s to x  3s

© 2011 Pearson Education, Inc


Interpreting Standard Deviation:
Empirical Rule

x – 3s x – 2s x–s x x+s x +2s x + 3s

Approximately 68% of the measurements


Approximately 95% of the measurements

Approximately 99.7% of the measurements


© 2011 Pearson Education, Inc
© 2011 Pearson Education, Inc
Empirical Rule Example
Previously we found the mean
closing stock price of new
stock issues is 15.5 and the
standard deviation is 3.34. If
we can assume the data is
symmetric and mound shaped,
calculate the percentage of the
data that lie within the intervals
x + s, x + 2s, x + 3s.
© 2011 Pearson Education, Inc
Empirical Rule Example
• According to the Empirical Rule, approximately 68%
of the data will lie in the interval (x – s, x + s),
(15.5 – 3.34, 15.5 + 3.34) = (12.16, 18.84)
• Approximately 95% of the data will lie in the interval
(x – 2s, x + 2s),
(15.5 – 2∙3.34, 15.5 + 2∙3.34) = (8.82, 22.18)
• Approximately 99.7% of the data will lie in the interval
(x – 3s, x + 3s),
(15.5 – 3∙3.34, 15.5 + 3∙3.34) = (5.48, 25.52)
© 2011 Pearson Education, Inc
Coefficient of variation / CV

• The coefficient of variation (CV) is the ratio of the standard deviation


to the mean. The higher the coefficient of variation, the greater the
level of dispersion around the mean. It is generally expressed as a
percentage. Without units, it allows for comparison between
distributions of values whose scales of measurement are not
comparable. When we are presented with estimated values, the CV
relates the standard deviation of the estimate to the value of this
estimate. The lower the value of the coefficient of variation, the
more precise the estimate.
(accessible at: https://www.insee.fr/en/metadonnees/definition/c1366)

© 2011 Pearson Education, Inc


Covariance
• Variables may change in relation to each
other

• Covariance measures how much the


movement in one variable predicts the
movement in a corresponding variable

R F Riesenfeld Sp 2010 CS5961 Comp Stat 46


Smoking and Lung Capacity
• Example: investigate relationship between
cigarette smoking and lung capacity

• Data: sample group response data on


smoking habits, and measured lung
capacities, respectively

R F Riesenfeld Sp 2010 CS5961 Comp Stat 47


Smoking v Lung Capacity Data

N Cigarettes (X ) Lung Capacity (Y )

1 0 45
2 5 42
3 10 33
4 15 31
5 20 29

R F Riesenfeld Sp 2010 CS5961 Comp Stat 48


Smoking v Lung Capacity
• Observe that as smoking exposure goes
up, corresponding lung capacity goes
down
• Variables covary inversely
• Covariance and Correlation quantify
relationship

R F Riesenfeld Sp 2010 CS5961 Comp Stat 49


Covariance
• Variables that covary inversely, like
smoking and lung capacity, tend to appear
on opposite sides of the group means
– When smoking is above its group mean, lung
capacity tends to be below its group mean.
• Average product of deviation measures
extent to which variables covary, the
degree of linkage between them

R F Riesenfeld Sp 2010 CS5961 Comp Stat 50


The Sample Covariance
• Similar to variance, for theoretical reasons,
average is typically computed using (N -1),
not N . Thus,

1 N
S xy  
N  1 i 1
 Xi  X Y  Y 
i

R F Riesenfeld Sp 2010 CS5961 Comp Stat 51


Calculating Covariance

Cigs (X ) Lung Cap (Y )


0 45
5 42
10 33
15 31
20 29
X  10 Y  36

R F Riesenfeld Sp 2010 CS5961 Comp Stat 52


Calculating Covariance

Cigs (X ) ( X  X ) ( X  X ) (Y  Y ) (Y  Y ) Cap (Y )

0 -10 -90 9 45
5 -5 -30 6 42
10 0 0 -3 33
15 5 -25 -5 31
20 10 -70 -7 29
∑= -215
R F Riesenfeld Sp 2010 CS5961 Comp Stat 53
54
Covariance Calculation (2)

Evaluation yields,

1
S xy  ( 215)  53.75
4

R F Riesenfeld Sp 2010 CS5961 Comp Stat 55


2.7

Numerical Measures
of Relative Standing

© 2011 Pearson Education, Inc


Numerical Measures of
Relative Standing: Percentiles
• Describes the relative location of a
measurement compared to the rest of the data
• The pth percentile is a number such that p% of
the data falls below it and (100 – p)% falls
above it
• Median = 50th percentile

© 2011 Pearson Education, Inc


Percentile Example
• You scored 560 on the GMAT exam. This
score puts you in the 58th percentile.
• What percentage of test takers scored lower
than you did?
• What percentage of test takers scored higher
than you did?

© 2011 Pearson Education, Inc


Percentile Example
• What percentage of test takers scored lower
than you did?
58% of test takers scored lower than 560.
• What percentage of test takers scored higher
than you did?
(100 – 58)% = 42% of test takers scored
higher than 560.

© 2011 Pearson Education, Inc


© 2011 Pearson Education, Inc
Deciles Example
Deciles divide a set of observations into 10 equal parts and
percentiles into 100 equal parts. So if you found that your GPA
was in the 8th decile at your university, you could conclude that
80% of the students had a GPA lower than yours and 20%
had a higher GPA.

© 2011 Pearson Education, Inc


2.8

Methods for Detecting Outliers:


Box Plots and z-Scores

© 2011 Pearson Education, Inc


Outlier
An observation (or measurement) that is unusually large
or small relative to the other values in a data set is
called an outlier. Outliers typically are attributable to
one of the following causes:
1. The measurement is observed, recorded, or entered
into the computer incorrectly.
2. The measurement comes from a different
population.
3. The measurement is correct but represents a rare
(chance) event.
© 2011 Pearson Education, Inc
Quartiles
Measure of noncentral tendency
Split ordered data into 4 quarters
25% 25% 25% 25%
Q1 Q2 Q3
Lower quartile QL is 25th percentile.
Middle quartile m is the median.
Upper quartile QU is 75th percentile.
Interquartile range: IQR = QU – QL
© 2011 Pearson Education, Inc
© 2011 Pearson Education, Inc
Interquartile Range
1. Measure of dispersion
2. Also called midspread
3. Difference between third & first quartiles
• Interquartile Range = Q3 – Q1
4. Spread in middle 50%
5. Not affected by extreme values

© 2011 Pearson Education, Inc


What is Kurtosis?

A normal distribution has excess kurtosis equal to zero, a leptokurtic


distribution has excess kurtosis greater than zero, and platykurtic distributions
will have excess kurtosis less than zero.

“Greater positive kurtosis and more negative skew in


returns distributions indicates increased risk.”

You might also like