Professional Documents
Culture Documents
Descriptive Statistics PDF
Descriptive Statistics PDF
Descriptive Statistics PDF
Measures
Descriptive Statistics
Mode Variance
Standard Deviation
Coefficient of Variation
Calculating the Mean, Median
and Mode for ungrouped data
The Sample Mean
∑x x1 + x2 + + xn
i
=x =
i =1
n n
Sample size=number of observations n observations
Example 1
For this sample data Xi:
x1 2 ∑x i
x2 3 Sample mean, x = i =1
n
x3 5
x4 1 24
x5 4
x=
8
x6 3
2
x7
x=3
x8 4
Σxi 24
Example 2
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
Measures of Central Tendency:
The Median
n +1
Median position = position in the ordered data
2
Find the median value.
Example 1
Find the median for the following data set.
27 38 12 34 42 40 24 40 23
The ordered set becomes
Observation 12 23 24 27 34 38 40 40 42
Rank 1 2 3 4 5 6 7 8 9
9 + 1 th
The median position is = 5 rank (observation)
2
Therefore the median = 34
Example 2
24 31 27 25 35 33 26 40 25 28
Properties of the Median
In an ordered array, the median is the “middle”
number (50% above, 50% below)
Uniqueness -- There is only one median for each
set of data.
Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
Measures of Central Tendency:
The Mode
The mode is the most frequently
occurring value in a set of observations.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Measures of Central Tendency:
Review Example
Central Tendency
∑X i
XG = ( X1 × X 2 × × Xn )1/ n
X= i=1
n Middle value Most Rate of
in the ordered frequently change of
array observed a variable
value over time
Measures of Dispersion for
ungrouped data
Measures of Dispersion
The measures of central tendency, such
as the mean, median and mode, do not
reveal the whole picture of the
distribution of the dataset.
Dataset 1
Dataset 2
Measures of Dispersion
Population 1 Population 2
Narrow range Wide range
Smaller Larger
variation variation
Smaller Larger
deviation deviation Population 1
Observations Observations
clustered spread out Population 2
Same centre,
different variation
Measures of Dispersion:
Summary Characteristics
The more the data are spread out, the
greater the range, variance, and
standard deviation.
Variation
Same centre,
different variation
Measures of Dispersion:
The Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 – 1 = 12
Measures of Dispersion:
Why The Range Can Be Misleading
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Measures of Dispersion:
Why The Range Can Be Misleading
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
∑ (x i − x) 2
s =
2 i=1
n−1
Where X = arithmetic mean
n = sample size
Xi = ith observation of the
variable X
The Sample Variance
2, 3, 5, 1, 4, 3, 2, 4 find.
1. Sample variance
2. Sample standard deviation
xi
2
3
5
1
4
3
2
4
Σ 24
xi
2 2-3
3 3-3
5 5-3
1 1-3
4 4-3
3 3-3
2 2-3
4 4-3
Σ 24
xi
2 -1
3 0
5 2
1 -2
4 1
3 0
2 -1
4 1
Σ 24 0
xi
2 -1 1
3 0 0
5 2 4
1 -2 4
4 1 1
3 0 0
2 -1 1
4 1 1
Σ 24 12
Solution
n
∑x i
Sample mean=x = 3
i =1
n
n 2
∑( x i − x)
s = 2 i =1
n −1
12
Sample variance s= = 1.714
2
7
The Sample Standard Deviation
Most commonly used measure of variation
Tells us how much observations in our sample
differ from the mean value within our sample.
Has the same units as the original data
s= s 2
Solution n
∑x i
Sample mean
= x i =1
= 3
n
n 2
∑( x i − x)
Sample variance s =
2 i =1
n −1
12
2
s= = 1.714
7
Sample standard deviation
=s s
= 2
= 1.309
1.714
Measures of Dispersion:
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
s
CV = ×100%
x
Example 1
The yearly salaries of all employees who work for
a company have a mean of $62,350 and a
standard deviation of $6820. The years of
experience for the same employees have a mean
of 15 years and a standard deviation of 2 years. Is
the relative variation in the salaries larger or
smaller than that in the years of experience for
these employees?
Example 2
Stock A Stock B
Average price $50 $100
Standard deviation $5 $5
Measures of Dispersion:
Comparing Coefficients of Variation
s 5
CVA =
⋅100% = ⋅100% =
10%
x 50
s 5
CVB = ⋅100% = ⋅100% =5%
x 100
= Q 1 0.25 ( n + 1)
First quartile position:
= Q 2 0.5 ( n + 1)
Second quartile position:
= Q 3 0.75 ( n + 1)
Third quartile position:
Quartile Measures:
The Interquartile Range (IQR)
Because the range can be distorted by
outliers (extreme values), a modified range
which excludes these outliers if often
calculated.
IQR
= Q3 − Q1
Quartile Measures:
The Interquartile Range (IQR)
The IQR is also called the 50%
midspread.
Find
and 3
1. Q 1 Q
2. IQR
Locating First quartile, Q1
11 12 13 16 16 17 18 21 22
(n = 9)
Q1 is in the 0.25(9+1)=2.5position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Locating Third Quartile, Q3
11 12 13 16 16 17 18 21 22
(n = 9)
Q3 is in the 0.75(9+1)=7.5position of the ranked data
so use the value half way between the 7th and 8th values,
so Q3 = 19.5
The Interquartile Range (IQR)
IQR
= Q3 − Q1
= 19.5 − 12.5
= 7.0
Example 2
Find
and 3
1. Q 1 Q
2. IQR
Locating First quartile, Q1
7 8 9 10 11 12 13 13 14 17 17 45
7 8 9 10 11 12 13 13 14 17 17 45
IQR
= Q3 − Q1
= 16.25 − 9.25
= 7.0
Numerical Descriptive
Measures of a Population
Numerical Descriptive Measures
for a Population
Numerical descriptive measures
discussed so far described a sample, not
the population.
∑X i
µ= i =1
N
Where μ = population mean
N = population size
Xi = ith observation of the
variable X
Numerical Descriptive Measures
For A Population:
The Population Variance σ2
N
∑ (X − μ)i
2
σ =2 i=1
N
Where μ = population mean
N = population size
Xi = ith observation of the
variable X
Numerical Descriptive Measures
For A Population: The Population
Standard Deviation σ
∑ (X i − μ) 2
σ= i =1
N
Sample statistics versus
population parameters
Measure Population Sample
Parameter Statistic
Mean µ x
Variance σ2 s2
Standard σ s
Deviation
Proportion π p
Approximating the Mean,
Variance and Standard
deviation from grouped data
Computing Numerical Descriptive
Measures From A Frequency
Distribution
We can only compute approximations to
the mean, variance and the standard
deviation of the data since we are
dealing with grouped data.
Approximating the Sample Mean
from a Frequency Distribution
Use the midpoint of a class interval to approximate the values
in that class
k
∑fx i i
x= i=1
n
Where n = number of observations or sample size
k = number of classes in the frequency
distribution
xi = class midpoint
fi = frequency of observations
Example 1
The table below gives the commuting times (in
minutes) from home to work for 30 employees of
a company
18 15 7 24 10
23 28 10 16 12
5 23 24 16 19
26 17 27 17 17
29 18 23 9 26
12 22 14 26 22
Descriptive Statistics on Raw Data
Std.
n Range Mean Deviation Variance
Time 30 24 18.50 6.627 43.914
Question 1
Class
fi xi fixi
Limits
5 ≤ x <10
10 ≤ x <15
15 ≤ x <20
20 ≤ x <25
25 ≤ x <30
Frequency Distribution
Class
fi xi fixi
Limits
5 ≤ x <10 3
10 ≤ x <15 5
15 ≤ x <20 9
20 ≤ x <25 7
25 ≤ x <30 6
30
Frequency Distribution
Class
fi xi fixi
Limits
5 ≤ x <10 3 7.5
10 ≤ x <15 5 12.5
15 ≤ x <20 9 17.5
20 ≤ x <25 7 22.5
25 ≤ x <30 6 27.5
30
Frequency Distribution
Class
fi xi fixi
Limits
∑fx i i
565
Mean= x= i =1
=
n 30
x = 18.833 minutes
x = 18.8minutes
Approximating the Sample Standard
Deviation from a Frequency Distribution
∑ (x − x) i
2
fi
s= i=1
n−1
Where n = number of observations or sample size
k = number of classes in the frequency distribution
xi = class midpoint
fi = frequency of observations
Descriptive Statistics on Raw Data
Std.
n Range Mean Deviation Variance
Time 30 24 18.50 6.627 43.914
Question 2
Class
fi mid- ( xi − x ) ( xi − x ) ( x − x )
2 2
Class Limits i fi
point, xi
5 ≤ x <10 3 7.5
10 ≤ x <15 5 12.5
15 ≤ x <20 9 17.5
20 ≤ x <25 7 22.5
25 ≤ x <30 6 27.5
Frequency Distribution
Class
fi mid- ( xi − x ) ( xi − x ) ( x − x )
2 2
Class Limits i fi
point, xi
Class
fi mid- ( xi − x ) ( xi − x ) ( x − x )
2 2
Class Limits i fi
point, xi
Class
fi mid- ( xi − x ) ( xi − x ) ( x − x )
2 2
Class Limits i fi
point, xi
∑( x )
2
i −x fi =
1146.670
∑( x )
2
i −x fi
Variance= s=
2 i =1
n −1
1146.670
=s 2
= 39.540
29
The Standard Deviation
s= s 2
=s s
=2
39.540
s = 6.288 minutes
s = 6.3minutes
Class Exercise 1
The frequency distribution table below gives the
number of iPods sold by a shop on each of 30 days.
Calculate the mean, variance and standard
deviation.
iPods sold f
5-9 3
10 - 14 6
15 - 19 8
20 -24 8
25 -29 5
30
Class Exercise 2
Sambiri Silicon manufactures computer monitors.
The following table represents the distribution of
computer monitors produced at the company for
a sample of 30 days. Calculate the mean, variance
and standard deviation.
Class Limits f
21 - 23 7
24 - 26 6
27 - 29 6
30 -32 4
33 -35 7
30
Class Exercise 3
A sample of 40 randomly selected households
from a city produced the following distribution of
the number of vehicles owned. Find the mean,
variance and standard deviation.
Class f
0 2
1 18
2 11
3 4
4 3
5 2
Approximating the Median
from grouped data
Approximating the Median from a
Frequency Distribution
c [ 0.5n − CF ]
Me = L +
fme
1. The median.
Commuting times Example
Class Limits fi CF
5 ≤ x <10 3 3
10 ≤ x <15 5 8
15 ≤ x <20 9 17
20 ≤ x <25 7 24
25 ≤ x <30 6 30
30
L = ? 15
c=? 5
n = 30 fme
fme = ? 9
Median CF = ? 8
interval CF
Cumulative
Class Limits fi
frequency
5 to <10 3 3 Median
10 to <15 5 8 position
15 to <20 9 17
20 to <25 7 24
25 to <30 6 30
n=∑ fi=30
1. The Median commuting time
n 30
n 30
= = = 15th observation
2 2
The median interval is
15 to < 20
as it contains the 15th observation
=L 15, = c 5, = n 30,f= me 9,=CF 8
5[ 0.5 × 30 − 8]
Me =
15 + =
18.889 ≈ 18.9 minutes
9
Approximating the Mode from
grouped data
Approximating the Mode from a Frequency
Distribution
c ( fm − fm −1 )
Mo = L +
2 fm − fm −1 − fm +1
5 to <10 3 Modal
10 to <15 5 value
15 to <20 9
20 to <25 7
25 to <30 6
n=∑ fi=30
2. The Modal commuting time
Identify the modal interval
This is the interval associated with the highest frequency
5 ( 9 − 5)
Mo =
15 + =
18.333 ≈ 18.3minutes
2(9) − 5 − 7
Finding the mode using the graphical
method.
Number of Minutes
10
6
Frequency
0
7.5 12.5 17.5 22.5 27.5
Class Midpoint
Class Exercise 1
1. Q1
2. Q3
3. The IQR
Class Limits fi CF
5 ≤ x <10 3 3
10 ≤ x <15 5 8
15 ≤ x <20 9 17
20 ≤ x <25 7 24
25 ≤ x <30 6 30
30
Q1 = 25th percentile
5[ 0.25 × 30 − 3]
P25 =
10 + =
14.5
5
Q3 = 75th percentile
5[ 0.75 × 30 − 17]
P75 =
20 + =
23.9
7
The Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
Interquartile range
= 23.9 – 14.5 = 9.4
Class Exercise 2