Professional Documents
Culture Documents
03 Descriptive-Numerical
03 Descriptive-Numerical
03 Descriptive-Numerical
2
Definitions
3
Course outline
4
Measures of location/central tendency
§ Mean
§ Median
§ Mode
§ Percentiles
§ Quartiles
5
Measures of central tendency
the arithmetic mean
§ The mean of a data set is the average of all the data
values.
§ The mean provides a measure of central location for the
data.
§ If the data are for a sample, the mean is denoted by 𝑥̅
§ If the data are for a population, the mean is denoted by
the Greek letter µ.
6
Measures of central tendency
the arithmetic mean
7
Measures of central tendency
the arithmetic mean
åX i
X1 + X2 + ! + Xn
X= i=1
=
n n
Sample size Observed values
8
Measures of central tendency
the arithmetic mean
§ The most common measure of central tendency
§ Mean = sum of values divided by the number of values
§ Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
9
Example: Apartment rents
10
Measures of central tendency
the median
§ The median is another measure of central location.
§ The median is the value in the middle when the data are arranged
in ascending order (smallest value to largest value).
§ The median is the measure of location most often reported for
annual income and property value data.
§ A few extremely large incomes or property values can inflate the
mean.
§ A sample median is notated by 𝑥# and population median is
notated by 𝜇#
11
Measures of central tendency
the median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 4 Median = 4
13
Measures of central tendency
the mode
§ The mode of a data set is the value that occurs with
greatest frequency.
§ The greatest frequency can occur at two or more
different values.
§ If the data have exactly two modes, the data are
bimodal.
§ If the data have more than two modes, the data are
multimodal.
14
Measures of central tendency
the mode
§ Value that occurs most often
§ Not affected by extreme values
§ Used for either numerical or categorical data
§ There may be no mode
§ There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 9 No Mode
15
Example: apartment rents
16
Measures of central tendency
which measure to choose?
§ The mean is generally used, unless extreme values
(outliers) exist.
§ Then median is often used, since the median is not
sensitive to extreme values. For example, median
home prices may be reported for a region; it is less
sensitive to outliers.
17
Percentiles
18
Percentiles
19
Example: apartment rents
20
Quartiles
§ Quartiles are specific percentiles with each part containing
approximately one-fourth, or 25% observations
§ First Quartile = 25th Percentile
§ Second Quartile = 50th Percentile = Median
§ Third Quartile = 75th Percentile
21
Example: apartment rents
22
Measures of central tendency
the geometric mean
§ Geometric mean
§ Used to measure the rate of change of a variable over time
X G = ( X1 ´ X 2 ´!´ X n ) 1/ n
RG = [(1 + R1 ) ´ (1 + R 2 ) ´ ! ´ (1 + Rn )] 1/ n
-1
§ Where Ri is the rate of return in time period i
23
Measures of central tendency
the geometric mean
Arithmetic
(-.5) + (1)
mean rate X= = .25 Misleading result
2
of return:
25
Measures of central tendency
summary
Central Tendency
åX i
XG = ( X1 ´ X2 ´ ! ´ Xn )1/ n
X= i=1
n Middle value in Most
the ordered frequently
array observed
value
26
Measures of variation
§ It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
§ For example, in choosing supplier A or supplier B we might
consider not only the average delivery time for each, but
also the variability in delivery time for each.
27
Measures of variation
§ Range
§ Interquartile range
§ Variance
§ Standard deviation
§ Coefficient of variation
28
Range
§ The range of a data set is the difference between the largest and
smallest data values.
§ It is the simplest measure of variation.
§ It is very sensitive to the smallest and largest data values.
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
29
Example: apartment rents
30
Measures of variation
disadvantages of the range
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
§ Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
31
Interquartile range
§ Problems caused by outliers can be eliminated by using the
interquartile range.
§ The IQR can eliminate some high and low values and calculate the
range from the remaining values.
§ Interquartile range = 3rd quartile – 1st quartile
= Q3 – Q1
Example:
Median X
X Q1 (Q2) Q3 maximum
minimum
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27 32
Example: apartment rents
33
Measures of variation
standard deviation
§ The variance is a measure of variability that utilizes all
the data.
§ It is based on the difference between the value of each
observation (xi) and the mean (x for a sample, µ for a
population).
34
Measures of variation
standard deviation
35
Measures of variation
standard deviation
36
Example – mean class
37
Example – starting salary
38
Measures of variation
standard deviation
39
Example – starting salary
40
Measures of variation
comparing standard deviation
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
41
Measures of variation
comparing standard deviation
42
Measures of variation
summary characteristics
§ The more the data are spread out, the greater the range,
interquartile range, variance, and standard deviation.
§ The more the data are concentrated, the smaller the
range, interquartile range, variance, and standard
deviation.
§ If the values are all the same (no variation), all these
measures will be zero.
§ None of these measures are ever negative.
43
Coefficient of variation
44
Example – mean class
45
Measures of distribution shape, relative location
and detecting outliers
§ Distribution Shape
§ z-Scores
§ Chebyshev’s Theorem
§ Empirical Rule
§ Detecting Outliers
46
Shape of distribution
§ Describes how data are distributed
§ Measures of shape
§ Symmetric or skewed
47
Shape of distribution
48
General descriptive statistics using
Microsoft Excel
1. Select Tools.
3. Select Descriptive
Statistics and click OK.
49
General descriptive statistics using
Microsoft Excel
50
z-scores
51
Example – mean class
52
Chebyshev’s theorem
53
Example - Chebyshev’s theorem
54
Example - Chebyshev’s theorem
1 1
1− / = 1− /
= 0.826
𝑧 (2.4)
At least 82.6% of the students must have test scores between
58 and 82
55
Empirical rule
56
Empirical rule
57
Empirical rule
58
Empirical rule
59
Example – apartment rents
𝑥̅ = 490.8
𝑠 = 54.74
60
Example – apartment rents
𝑥̅ = 490.8
𝑠 = 54.74
61
Example – apartment rents
𝑥̅ = 490.8
𝑠 = 54.74
615 615
62
Detecting outliers
63
Detecting outliers
64
Example – detecting outliers
65
Example: apartment rents
66
Exploratory data analysis
§ Five-Number Summary
1. Minimum/smallest value
2. First quartile (Q1)
3. Median (Q2)
4. Third quartile (Q3)
5. Maximum/largest value
§ Box-and-Whisker Plot
67
Five-number summary
Max. value
§ Five-Number Summary
1. Minimum/smallest value
2. First quartile (Q1)
3. Median (Q2)
4. Third quartile (Q3)
5. Maximum/largest value
§ Box-and-Whisker Plot 68
Box-and-whisker plot
70
Measures of association between two
variables
§ Covariance
§ Correlation coefficient
71
Covariance
72
Example - stereo and sound equipment
store
73
Scatter diagram and calculation of sample
covariance for the stereo and sound equipment store
74
Correlation coefficient
75
The correlation coefficient
Y Y Y
X X X
r = -1 r = -.6 r=0
Y Y
X X
r = +1 r = +.3
76
Correlation coefficient
77
Example - stereo and sound equipment store
78
Perfect linear relationship
79
The correlation coefficient
using Microsoft Excel
80
The correlation coefficient
using Microsoft Excel
81
The correlation coefficient
using Microsoft Excel
100
§ There is a relatively 95
Test #2 Score
90
#2. 75
70
70 75 80 85 90 95 100
Test #1 Score
§ Students who scored high
on the first test tended to
score high on second test.
82
The weighted mean and
working with grouped data
§ Weighted mean
§ Mean for grouped data
§ Variance for grouped data
§ Standard deviation for grouped data
83
Weighted mean
84
Example
85
Grouped data
87
Mean and sample variance for grouped
data
88
Example
89
Example
90
References
91