Professional Documents
Culture Documents
Lecture 2b - Descriptive Statistics II
Lecture 2b - Descriptive Statistics II
Lecture 2b - Descriptive Statistics II
NUMERICAL MEASURES
Numerical Measures in Statistics
• Measures of central tendency / location
• Measures that are computed from data within a population are called
population parameters
10 20 36 92 95 40 50 56 60 70
92 88 80 70 72 70 36 40 36 40
92 40 50 50 56 60 70 60 60 88
Solution
$3500 $2850
$4200 $3680
$4550 $2730
$14550 $3200
$4050 $3990
$'(,*+,
• 𝑥̅ = -
• 𝑥̅ = $3,638.89
$.*-(,
• 𝑥̅ =
"#
• 𝑥̅ = $4,730
• The median gives the centre of a histogram with half the values on the
left of the median and half on the right
Median
Median
• The median is found by taking the two middle values and dividing
by 2.
Median
• The following data are the mm of rainfall over a 12 month period
97 40 21 4 74 65 123 34 23 48 3 18
Arranging in ascending order
Since there are 12 values, there would be 2 values found in the middle
3 4 18 21 23 34 40 48 65 74 97 123
Median
!"#"$ &"
Median is found by %
= %
= 37
Median
• Often times, a dateset may be too large to identify the middle
straight away
• In such a case, we use a general formula to find the midpoint as
follows.
0(/
• For an odd number of cases: Median = th term
*
! !
12 1345 (( (/) 12 1345
" "
• For an even number of cases: Median = *
• For example: If a dataset has 500 cases the median would be
*&'89 8:;<(*&/=8 8:;<
*
Mode
• The mode of the dataset is the value that occurs the most often.
• The greatest frequency can occur in more than one value
• If a dataset has exactly two modes, the data are bimodal.
• If a dataset has more than two modes, the data are multimodal.
• Some data may have no mode, if each data point occurs only
once.
Mode Example
The data on right shows the monthly rents
of seventy apartments.
Mode, Median, Mean
• The mode can be calculated on both quantitative and qualitative
data. The mean and median can only be calculated on quantitative
data.
• A data set can have zero or more than one mode, but there can only
be one mean or median.
Trimmed Mean
• The trimmed mean is a measure for calculating the mean when
extreme values are present.
• To obtain the trimmed mean, we delete a given percentage of the
largest and smallest values.
• The mean is the calculated on the remaining values.
• For example, the 5% trimmed mean is obtained by deleting the
smallest 5% and the largest 5% of values and calculating the
mean on the remaining values.
• In order to calculate the trimmed mean, the values must be
arranged in ascending order before removing the largest and
smallest values.
Example
• The following data are test scores from a Spanish test of 19 students:
72, 99, 98, 76, 92, 45, 91, 91, 85, 90, 87, 88, 85, 85, 80, 79, 67, 66, 87
Find the 5% trimmed mean.
/)/.
• = 83.5
/-
• Where
• xi = value for observation i
• wi = weight for observation i
• i.e numerator = sum of weighted data values
• denominator = sum of weights
= [(0.940)(0.920)(0.960)(1.020)(1.054)]1/5
= [0.89254]1/5
= 0.97752
AND
Range = 37 – 19 = 18
The range of ages with the study is 18 years.
Disadvantages of the Range
• The standard deviation tell us how closely the values of the data are clustered around the
mean.
• The lower the standard deviation, the closer the values are clustered around the mean.
• The larger the standard deviation, the further the values are spread around the mean.
• The standard deviation is obtained by taking the positive square root of the variance.
Variance & Standard Deviation
• The variance calculated for population data is denoted as 𝜎2 (sigma
squared)
• This difference is known as the deviation about the mean (𝑥̅ for a
sample and 𝜇 for the population)
$ $
' ()* ' ()(̅
• 𝜎= +
and s= -).
= 63
SOLUTION CONT’d
x x /
x-𝒙 / )2
(x-𝒙
20 63 -43 1849
40 63 -23 529
60 63 -3 9
60 63 -3 9
75 63 12 144
80 63 17 289
70 63 7 49
65 63 2 4
70 63 7 49
90 63 27 729
Σ = 3660
Solution cont’d
'11,
• = -
• = 406.67
Solution cont’d
• Sample Standard Deviation is given by
&
! "@"̅
•s=
?@/
• = 406.67
• = 20.17
Variance & Standard Deviation
x /
𝒙 /
x-𝒙
20 63 -43
• The reason for squaring the deviations
from the individual measures is 40 63 -23
because 60 63 -3
the sum would result in zero and this 60 63 -3
implies that there is no deviation from 75 63 12
the mean.
80 63 17
70 63 7
• Having no deviation from the mean is 65 63 2
not a true measure.
70 63 7
90 63 27
Σ=0
Coefficient of Variation (CV)
• The coefficient of variation is a measure of how large the
standard deviation is in relation to the mean.
B
• For population data, the CV = x 100%
C
=
• For sample data, the CV = "̅ x 100%
example
• The mean scores of two students, Sonia and Mark, in 5 subjects are 96
and 92 with a standard deviation of 2.4 and 4.6 respectively. Who is
the more consistenct performer?
• We can answer this question using the coefficient of variation as
follows:
%." ".1
CV for Sonia = 01
x 100% = 2.5% CV for Mark = 0%
x 100% = 5%
!<D
• The mean for grouped sample data is given by: 𝑥̅ = ?
'23
• The mean for grouped sample data is given by: 𝑥̅ = -
'23
• 𝑥̅ = =
-
Solution cont’d
!<D /*++
• 𝑥̅ = = = 61.3 seconds
? */
Variance & Standard deviation of grouped
data
• The formula for standard deviation and variance of grouped data is given by:
! !
! " #$ % ! " #$(̅
• 𝜎2 = 2
s=
& )$*
• The data below show the time in minutes for 25 employees to get to
work
!./
𝑥̅ =
0
Solution cont’d
Commute time Frequency (f) Midpoint (m) mf m–𝜇 (m – 𝜇)2 f( m - 𝜇 )2
0 to less than 10 4 5 20 -16.4 268.96 1075.84
10 to less than 20 9 15 135 -6.4 40.96 368.64
20 to less than 30 6 25 150 3.6 12.96 77.76
30 to less than 40 4 35 140 13.6 184.96 739.84
40 to less than 50 2 45 90 23.6 556.96 1113.92
25 535 Σ = 3376
!./ )$)
𝜇= = = 21.4
0 -)
!
! / .12
𝜎2 =
0
$$'3
𝜎2 = -)
= 135.04
2. Empirical rule
3. Percentile rank
End of descriptive
statistics ii