Professional Documents
Culture Documents
1.1 Descriptive Statistics
1.1 Descriptive Statistics
population
sample
GRAPHING CATEGORICAL DATA
• Bar chart: Useful for summarizing
and displaying the patterns in
categorical data; is created by
plotting all the categories in the data
on one axis and the frequency (or
relative frequency or percentages) of
occurrence of each category in the
data on the other axis. Either
horizontal or vertical bars of height
(or length) equal to the frequency (or
relative frequency or percentages)
are drawn.
• Pie Charts: Suitable to represent
categorical data; used to show
percentage; areas are proportion to
value of category.
GRAPHING NUMERICAL DATA
• Histograms: The graphical
representation of a frequency
table; Summarizes
categorical, nominal and
ordinal data; Display bar
vertically or horizontally,
where the area is
proportional to the frequency
of the observation falling in to
the class.
Data Frequency Table
The following data represent the amount of temperature of
rainfall as part of a hydrological study of a certain location.
112 100 127 120 134 118 105 110 109 112
110 118 117 116 118 122 114 114 105 109
107 112 114 115 118 117 118 122 106 110
116 108 110 121 113 120 119 111 104 111
120 113 120 117 105 110 118 112 114 114
Frequency Distribution Table Frequency Histogram
Class Cumulative
Class limits frequency
boundaries frequency
n = f = 50
The Ogive (Cumulative Frequency)
Visualizing Distributions
Three Components:
• Center – distribution describes a point near the middle of a
distribution that might serve as a typical value or a balance
point for the distribution
• Spread – distribution describes how the data spread out around
the center
• Shape – distribution describes the basic pattern of the plotted
data along with any notable departures from the pattern.
Shapes of Distribution
Bell-shaped Uniform
Right-skewed Left-skewed
Bimodal U-shaped
I. Measure of Center
1. Mean
If the dataset contains n measurement labeled 𝑥1 , 𝑥2 , … , 𝑥𝑛 , then
the mean (read as x bar) is defined as
𝑛
1
𝑥ҧ = 𝑥𝑖
𝑛
𝑖=1
a. Population: μ
b. Sample of n measurements
2. Median
For any dataset, the median is the middle of the ordered array of
numerical values.
• Arrange n measurements in increasing order of value, in other
words, from smallest to largest.
𝑛+1
• Compute 𝑙 = .
2
• Then the median M = the value of the lth measurement in the
ordered array of measurements.
3. Mode
the most frequently occurring data value
a) If all the elements in the data set have the same frequency of
occurrence, then the data set is said to have no mode.
b) If the data set has one value that occurs more frequently than
the rest of the values, then the data set is said to be unimodal.
c) If two elements of the data set are tied for the highest
frequency of occurrence, then the data set is said to be
bimodal.
Example 1
In order of the cities, the AQI data for year 2003 are, solve for
mean and median.
12, 8, 10, 5, 17, 19, 31, 11, 88, 11, 19, 37, 1, 2, 12
Mean = 𝑥ҧ = 18.87
Median = M = 12
Example 2
Mean transportation time for accidents in rural Alabama from
3,133 cases is 13.67 min, whereas that in urban Alabama in
2,065 cases is 8.97 min. The transportation time is defined as the
time to transport the vehicular accident victim from the site of
accident to the emergency care medical facility by the EMS
(emergency medical service) vehicle. What is the overall mean
transportation time for the state of Alabama?
c) Standard deviation
a) Population: = 2
b) Sample: s = s2
d. Coefficient of variation
A dimensionless quantity, the coefficient of variation is the ratio
between the standard deviation and the mean for the same
dataset, expressed as percentage,
12, 8, 10, 5, 17, 19, 31, 11, 88, 11, 19, 37, 1, 2, 12
Mean 18.87
Variance 462.25
Standard deviation 21.50
n 15
Significance of Standard deviation
Chebyshev’s Theorem
The proportion of any distribution that lies within s standard deviations of the mean is at
1
least: 1 − 2 , where s is any positive number except 1. Theorem applies to all distributions
𝑠
of data.
Empirical (Normal) Rule:
The Empirical Rule can be used only for relatively mound-shaped data sets.
III. Measures of relative standing
1. The z-score
The sample z score of a value of x is a measure of relative
standing defined by
x − x
z =
s
z-score measures the distance between an observation and the
mean, measured in units of standard deviation.
z-scores between -2 and +2 are highly likely.
Example
Mean = 0.233
Median = value between 30th and 31st = 0
Mode = 0
First quartile = 15th and 16th = 0
Third quartile = 45th and 46th = 0
Proportion defective of the sample = 14/60 = 0.0389
(relative
Sample variance= 0.2497
Sample Standard deviation = 0.4997
Coefficient of variance = 214%
Assignment:
The accompanying specific gravity Bin Frequency
values for various wood types used 0.35 2
in construction appeared in the 0.4 7
article “Bolted Connection design
Values om European Yield Model” 0.45 10
(J. of Structural Engr., 1993: 2169- 0.5 6
2186) 0.55 4
a. Construct a frequency histogram 0.6 1
and describe the distribution by its
shape. 0.65 1
b. Determine the mean, median and 0.7 4
mode, sample standard deviation, 0.75 1
sample variance, and coefficient of
variance