Professional Documents
Culture Documents
Analysing Data
Analysing Data
Analysing Data
Methods
Analysing Data
1
Summary Measures
Coefficient of
Variation
2
2.1
Measures of Central Tendency
Overview
Central Tendency
n
xi
i 1
x
n
Arithmetic Midpoint of Most frequently
average ranked values observed value
3
Arithmetic Mean
The arithmetic mean (mean) is the most
common measure of central tendency
For a population of N values:
N
xi
x1 x 2 xN Population
μ i 1
values
N N
Population size
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
5
Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
6
Median (continued)
7
Median Example (continued)
Data array:
4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24
Note that n = 13
Find the i = (1/2)n position:
i = (1/2)(13) = 6.5
Since 6.5 is not an integer, round up to 7
The median is the value in the 7th position:
Md = 12
8
Shape of a Distribution
Describes how data is distributed
Symmetric or skewed
9
Mode
A measure of location
The value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 5 No Mode
10
Weighted Mean
Example: Sample of
26 Repair Projects
Weighted Mean Days
Days to Frequency to Complete:
Complete
5 4 w i xi (4 5) (12 6) (8 7) (2 8)
XW
6 12 wi 4 12 8 2
7 8 164
6.31 days
8 2 26
11
Geometric Mean
Geometric mean
Used to measure the rate of change of a variable
over time
1/n
xg n (x1 x 2 xn ) (x1 x 2 xn )
Geometric mean rate of return
Measures the status of an investment over time
1/n
rg (x1 x 2 ... xn ) 1
13
Example (continued)
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
15
Review Example:
Summary Statistics
House Prices:
Mean: ($3,000,000/5)
$2,000,000 = $600,000
500,000
300,000
100,000
100,000 Median: middle value of ranked data
Sum 3,000,000
= $300,000
16
Which measure of location
is the “best”?
17
Other Location Measures
Other Measures
of Location
Percentiles Quartiles
18
Percentiles
p If i is not an integer,
i (n) round up to the next
100 higher integer value
19
Quartiles
Quartiles split the ranked data into 4 segments with
an equal number of values per segment
Q1 Q2 Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
20
Quartiles
25
Q1 = 25th percentile, so find i : i = 100 (9) = 2.25
21
Quartile Formulas
22
Measures of Variation
Variation
Sample Sample
Variance Standard
Deviation
23
Variation
Same center,
different variation
24
Measuring variation
25
Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
26
Disadvantages of the Range
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
27
Interquartile Range
28
Interquartile Range Example
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
29
Variance
n
Sample variance: (xi x )2
2 i 1
s
n -1
30
Population Variance
σ 2 i 1
N
Where μ = population mean
N = population size
xi = ith value of the variable x
31
Sample Variance
N
(x i μ) 2
σ i 1
N
34
Sample Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
35
Calculation Example:
Sample Standard Deviation
Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.570
37
Advantages of Variance and
Standard Deviation
38
Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Is used to compare two or more sets of data
measured in different units
Population Sample
σ s
CV 100% CV 100%
μ x
39
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
s $5
CVA 100% 100% 10%
x $50 Both stocks
Stock B: have the same
standard
Average price last year = $100 deviation, but
stock B is less
Standard deviation = $5 variable relative
to its price
s $5
CVB 100% 100% 5%
x $100
40
The Empirical Rule
68%
μ
μ 1σ
41
The Empirical Rule
μ 2σ contains about 95% of the values in
the population or the sample
μ 3σ contains almost all (about 99.7%) of
the values in the population or the sample
95% 99.7%
μ 2σ μ 3σ
42
Standardized Data Values
43
Standardized Population Values
x μ
z
σ
where:
x = original data value
μ = population mean
z = standard score
44
Standardized Sample Values
x x
z
s
where:
x = original data value
x = sample mean
z = standard score
45
Standardized Value Example
IQ scores in a large population have a bell-
shaped distribution with mean μ = 100 and
standard deviation σ = 15
Find the standardized score (z-score) for a
person with an IQ of 121.
47
Using Excel
48
Using Excel
Enter input
range details
Click OK
49
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
50