03 Descriptive-Numerical

Engineering Statistics
Descriptive Statistics – Numerical Method

Learning objectives
§ To describe the properties of location or central

tendency, variation and shape in numerical data
§ To calculate descriptive summary measures for a
population
§ To construct and interpret a box-and-whisker plot
§ To describe the covariance and coefficient of
correlation
2
Definitions
§ The central tendency is the extent to which all the data

values group around a typical or central value.
§ The variation is the amount of dispersion, or scattering,
of values
§ The shape is the pattern of the distribution of values
from the lowest value to the highest value.
3
Course outline
§ Measures of location/central tendency

§ Measures of variation
§ Measures of distribution shapes, relative location and
detecting outliers
§ Exploratory data analysis
§ Measures of association between two variables
§ The weighted mean and working with grouped data
4
Measures of location/central tendency
§ Mean
§ Median
§ Mode
§ Percentiles
§ Quartiles
5
Measures of central tendency
the arithmetic mean
§ The mean of a data set is the average of all the data
values.
§ The mean provides a measure of central location for the
data.
§ If the data are for a sample, the mean is denoted by 𝑥̅
§ If the data are for a population, the mean is denoted by
the Greek letter µ.
6
the arithmetic mean
7
the arithmetic mean
The arithmetic mean (mean) is the most common measure of

central tendency
For a sample of size n:

n
åX i
X1 + X2 + ! + Xn
X= i=1
=
n n
Sample size Observed values
8
the arithmetic mean
§ The most common measure of central tendency
§ Mean = sum of values divided by the number of values
§ Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
9
Example: Apartment rents
10
the median
§ The median is another measure of central location.
§ The median is the value in the middle when the data are arranged
in ascending order (smallest value to largest value).
§ The median is the measure of location most often reported for
annual income and property value data.
§ A few extremely large incomes or property values can inflate the
mean.
§ A sample median is notated by 𝑥# and population median is
notated by 𝜇#
11
the median
§ In an ordered array, the median is the “middle” number (50%

above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 4 Median = 4
§ Not affected by extreme values

n +1
§ Note that is NOT the value of the median,
2
only the position of the median in the ranked data.
12
Example: apartment rents
13
the mode
§ The mode of a data set is the value that occurs with
greatest frequency.
§ The greatest frequency can occur at two or more
different values.
§ If the data have exactly two modes, the data are
bimodal.
§ If the data have more than two modes, the data are
multimodal.
14
the mode
§ Value that occurs most often
§ Not affected by extreme values
§ Used for either numerical or categorical data
§ There may be no mode
§ There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 9 No Mode
15
16
which measure to choose?
§ The mean is generally used, unless extreme values
(outliers) exist.
§ Then median is often used, since the median is not
sensitive to extreme values. For example, median
home prices may be reported for a region; it is less
sensitive to outliers.
17
Percentiles
§ A percentile provides information about how the data

are spread over the interval from the smallest value to
the largest value.
§ Admission test scores for colleges and universities are
frequently reported in terms universities of percentiles.
18
Percentiles
19
20
Quartiles
§ Quartiles are specific percentiles with each part containing
approximately one-fourth, or 25% observations
§ First Quartile = 25th Percentile
§ Second Quartile = 50th Percentile = Median
§ Third Quartile = 75th Percentile
21
22
the geometric mean
§ Geometric mean
§ Used to measure the rate of change of a variable over time
X G = ( X1 ´ X 2 ´!´ X n ) 1/ n
§ Geometric mean rate of return

§ Measures the status of an investment over time
RG = [(1 + R1 ) ´ (1 + R 2 ) ´ ! ´ (1 + Rn )] 1/ n
-1
§ Where Ri is the rate of return in time period i
23
the geometric mean
An investment of $100,000 declined to $50,000 at the end of

year one and rebounded to $100,000 at end of year two:
X1 = $100,000 X2 = $50,000 X3 = $100,000
50% decrease 100% increase
The overall two-year return is zero, since it started and ended

at the same level.
24
the geometric mean
Use the 1-year returns to compute the arithmetic mean and the
geometric mean:
Arithmetic
(-.5) + (1)
mean rate X= = .25 Misleading result
2
of return:
Geometric R G = [(1 + R1 ) ´ (1 + R2 ) ´ !´ (1 + Rn )]1/ n - 1

More
mean rate of = [(1 + (-.5)) ´ (1 + (1))]1/ 2 - 1 accurate
return: result
= [(.50) ´ (2)]1/ 2 - 1 = 11/ 2 - 1 = 0%
25
summary
Central Tendency
Arithmetic Median Mode Geometric Mean

Mean
n
åX i
XG = ( X1 ´ X2 ´ ! ´ Xn )1/ n
X= i=1
n Middle value in Most
the ordered frequently
array observed
value
26
Measures of variation
§ It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
§ For example, in choosing supplier A or supplier B we might
consider not only the average delivery time for each, but
also the variability in delivery time for each.
27
§ Range
§ Interquartile range
§ Variance
§ Standard deviation
§ Coefficient of variation
28
Range
§ The range of a data set is the difference between the largest and
smallest data values.
§ It is the simplest measure of variation.
§ It is very sensitive to the smallest and largest data values.
Range = Xlargest – Xsmallest
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
29
30
disadvantages of the range
§ Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
§ Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
31
Interquartile range
§ Problems caused by outliers can be eliminated by using the
interquartile range.
§ The IQR can eliminate some high and low values and calculate the
range from the remaining values.
§ Interquartile range = 3rd quartile – 1st quartile
= Q3 – Q1
Example:
Median X
X Q1 (Q2) Q3 maximum
minimum
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27 32
33
standard deviation
§ The variance is a measure of variability that utilizes all
the data.
§ It is based on the difference between the value of each
observation (xi) and the mean (x for a sample, µ for a
population).
34
standard deviation
35
standard deviation
36
Example – mean class
37
Example – starting salary
38
standard deviation
39
Example – starting salary
40
comparing standard deviation
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Data B Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C Mean = 15.5

S = 4.570
11 12 13 14 15 16 17 18 19 20 21
41
comparing standard deviation
Small standard deviation
Large standard deviation
42
summary characteristics
§ The more the data are spread out, the greater the range,
interquartile range, variance, and standard deviation.
§ The more the data are concentrated, the smaller the
range, interquartile range, variance, and standard
deviation.
§ If the values are all the same (no variation), all these
measures will be zero.
§ None of these measures are ever negative.
43
Coefficient of variation
44
45
Measures of distribution shape, relative location
and detecting outliers
§ Distribution Shape
§ z-Scores
§ Chebyshev’s Theorem
§ Empirical Rule
§ Detecting Outliers
46
Shape of distribution
§ Describes how data are distributed
§ Measures of shape
§ Symmetric or skewed
Left-Skewed Symmetric Right-Skewed

Mean < Median Mean = Median Median < Mean
47
Shape of distribution
Mean < Median Median < Mean
Mean = Median Median < Mean
48
General descriptive statistics using
Microsoft Excel
1. Select Tools.
2. Select Data Analysis.
3. Select Descriptive
Statistics and click OK.
49
General descriptive statistics using
Microsoft Excel
4. Enter the cell

range.
5. Check the
Summary
Statistics box.
6. Click OK
50
z-scores
51
52
Chebyshev’s theorem
53
Example - Chebyshev’s theorem
54
Example - Chebyshev’s theorem
For the test scores between 58 and 82:

%&'()
q = −2.4, indicates 58 is 2.4 standard deviations below the mean.
%
&/'()
q = 2.4, indicates 82 is 2.4 standard deviations above the mean.
%
Applying Chebyshev theorem with z=2.4, we have:
1 1
1− / = 1− /
= 0.826
𝑧 (2.4)
At least 82.6% of the students must have test scores between
58 and 82
55
Empirical rule
56
Empirical rule
57
Empirical rule
58
Empirical rule
59
Example – apartment rents
𝑥̅ = 490.8
𝑠 = 54.74
60
𝑥̅ = 490.8
𝑠 = 54.74
61
𝑥̅ = 490.8
𝑠 = 54.74
615 615
62
Detecting outliers
§ Sometimes a data set will have one or more

observations with unusually large or unusually small
values.
§ These extreme values are called outliers.
§ Experienced statisticians take steps to identify outliers
and then review each one carefully.
§ An outlier may be a data value that has been
incorrectly recorded. à If so, it can be corrected
before further analysis.
63
Detecting outliers
§ An outlier may also be from an observation that was

incorrectly included in the data set. à if so, it can be
removed.
§ An outlier may be an unusual data value that has been
recorded correctly and belongs in the data set. à
should remain
§ Standardized values (z-scores) can be used to identify
outliers.
§ In using z-scores to identify outliers, we recommend
treating any data value with a z-score less than -3 or
greater than 3 as an outlier.
64
Example – detecting outliers
65
66
Exploratory data analysis
§ Five-Number Summary
1. Minimum/smallest value
2. First quartile (Q1)
3. Median (Q2)
4. Third quartile (Q3)
5. Maximum/largest value
§ Box-and-Whisker Plot
67
Five-number summary
Max. value
§ Five-Number Summary
1. Minimum/smallest value
2. First quartile (Q1)
3. Median (Q2)
4. Third quartile (Q3)
5. Maximum/largest value
§ Box-and-Whisker Plot 68
Box-and-whisker plot
§ A box plot is a graphical summary of data that is based on a

five-number summary.
§ A key to the development of a box plot is the computation
of the median and the quartiles, Q1 and Q3. The
interquartile range, IQR Q3 Q1, is also used.
o A box is drawn with the ends of the box located at the first and
third quartiles. For the salary data,Q1 =3465 and Q3= 3600. This
box contains the middle 50%of the data.
o A vertical line is drawn in the box at the location of the median
(3505 for the salary data).
o By using the interquartile range, IQR = Q3-Q1 limits are
located. The limits for the box plot are 1.5(IQR) below Q1 and
1.5(IQR) above Q3.
o The dashed lines called whiskers. The whiskers are drawn from
the ends of the box to the smallest and largest value inside limits.
o Finally, locate each outlier and drawn using asterisk symbol (*)69
Box plot
70
Measures of association between two
variables
§ Covariance
§ Correlation coefficient
71
Covariance
72
Example - stereo and sound equipment
store
73
Scatter diagram and calculation of sample
covariance for the stereo and sound equipment store
74
Correlation coefficient
75
The correlation coefficient
Y Y Y
X X X
r = -1 r = -.6 r=0
Y Y
X X
r = +1 r = +.3
76
Correlation coefficient
77
Example - stereo and sound equipment store
78
Perfect linear relationship
79
using Microsoft Excel
1. Select Tools/Data Analysis

2. Choose Correlation from
the selection menu
3. Click OK . . .
80
3. Input data range and select

appropriate options
4. Click OK to get output
81
§ r = .733 Scatter Plot of Test Scores
100
§ There is a relatively 95
strong positive linear
Test #2 Score
90
relationship between test 85
score #1 and test score 80
#2. 75
70
70 75 80 85 90 95 100
Test #1 Score
§ Students who scored high
on the first test tended to
score high on second test.
82
The weighted mean and
working with grouped data
§ Weighted mean
§ Mean for grouped data
§ Variance for grouped data
§ Standard deviation for grouped data
83
Weighted mean
84
Example
85
Grouped data
§ The weighted mean computation can be used to obtain

approximations of the mean, variance, and standard
deviation for the grouped data.
§ To compute the weighted mean, we treat the midpoint
of each class as though it were the mean midpoint of all
items in the class.
§ We compute a weighted mean of the class midpoints
using the class frequencies as weights.
§ Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.
86
Mean and sample variance for grouped
data
87
Mean and sample variance for grouped
data
88
Example
89
Example
90
References
§ Statistics for Business and Economics, Anderson,

Sweeney, and Williams, West Publishing Company.
§ Statistics for Business and Economics.,
SouthWestern/Thompson Learning
§ Statistics for Managers Using Microsoft Excel, 5e ©
2008 Pearson Prentice-Hall, Inc.
91

03 Descriptive-Numerical

Uploaded by

Copyright:

Available Formats

You might also like

03 Descriptive-Numerical

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

03 Descriptive-Numerical

Uploaded by

Copyright:

Available Formats

Engineering Statistics

Descriptive Statistics – Numerical Method

§ To describe the properties of location or central

§ The central tendency is the extent to which all the data

§ Measures of location/central tendency

The arithmetic mean (mean) is the most common measure of

For a sample of size n:

§ In an ordered array, the median is the “middle” number (50%

§ Not affected by extreme values

§ A percentile provides information about how the data

§ Geometric mean rate of return

An investment of $100,000 declined to $50,000 at the end of

X1 = $100,000 X2 = $50,000 X3 = $100,000

50% decrease 100% increase

The overall two-year return is zero, since it started and ended

Geometric R G = [(1 + R1 ) ´ (1 + R2 ) ´ !´ (1 + Rn )]1/ n - 1

Arithmetic Median Mode Geometric Mean

Range = Xlargest – Xsmallest

§ Ignores the way in which data are distributed

Data B Mean = 15.5

Data C Mean = 15.5

Small standard deviation

Large standard deviation

Left-Skewed Symmetric Right-Skewed

Mean < Median Median < Mean

Mean = Median Median < Mean

2. Select Data Analysis.

4. Enter the cell

For the test scores between 58 and 82:

Applying Chebyshev theorem with z=2.4, we have:

§ Sometimes a data set will have one or more

§ An outlier may also be from an observation that was

§ A box plot is a graphical summary of data that is based on a

1. Select Tools/Data Analysis

3. Input data range and select

§ r = .733 Scatter Plot of Test Scores

strong positive linear

relationship between test 85

score #1 and test score 80

§ The weighted mean computation can be used to obtain

§ Statistics for Business and Economics, Anderson,

You might also like