Professional Documents
Culture Documents
Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods
Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods
Descriptive Statistics
3.1 3.2 3.3 3.5 Describing Central Tendency Measures of Variation Percentiles, Quartiles Grouped Data
Mode, Mo
The Mean
Population X1, X2, , XN Sample x1, x2, , xn
Population Mean
N
Sample Mean
n
i=1
Xi
x=
x
i=1
x=
x
i =1
x1 + x2 + ... + xn = n
The Median
The median Md is a value such that 50% of all measurements, after having been arranged in numerical order, lie above (or below) it
1. If the number of measurements is odd, the median is the middlemost measurement in the ordering 2. If the number of measurements is even, the median is the average of the two middlemost measurements in the ordering
The Mode
The mode Mo of a population or sample of measurements is the measurement that occurs most frequently
Modes are the values that are observed most typically Sometimes higher frequencies at two or more values
If there are two modes, the data is bimodal If more than two modes, the data is multimodal
When data are in classes, the class with the highest frequency is the modal class
The tallest box in the histogram
Measures of Variation
Knowing the measures of central tendency is not enough Both of the distributions below have identical measures of central tendency
Measures of Variation
Range Variance Largest minus the smallest measurement The average of the squared deviations of all the population measurements from the population mean The square root of the variance
Standard Deviation
The Range
Largest minus smallest Measures the interval spanned by all the data For Figure 3.13, largest repair time is 5 and smallest is 3 Range is 5 3 = 2 days
Variance
n 1
n 1
Standard Deviation
Population standard deviation ():
s= s
(x x )
i =1 i
5 1 2 2 2 2 2 ( 30.8 31.26) + (31.7 31.26) + (30.1 31.26) + (31.6 31.26) + (32.1 31.26) = 4 2.572 = = 0.643 4
s = s 2 = 0.643 = 0.8019
68.26% of all individual cars will have mileages in the range [xs] = [31.60.8] = [30.8, 32.4] mpg 95.44% of all individual cars will have mileages in the range [x2s] = [31.61.6] = [30.0, 33.2] mpg 99.73% of all individual cars will have mileages in the range [x3s] = [31.62.4] = [29.2, 34.0] mpg
Chebyshevs Theorem
Let and be a populations mean and standard deviation, then for any value k> 1 At least 100(1 - 1/k2 )% of the population measurements lie in the interval [-k, +k]
Coefficient of Variation
Measures the size of the standard deviation relative to the size of the mean Coefficient of variation =standard deviation/mean 100% Used to:
Compare the relative variabilities of values about the mean Compare the relative variability of populations or samples with different means and different standard deviations Measure risk
Quartiles
Quartiles split the ranked data into 4 segments with an equal number of values per segment
25% 25% 25% 25%
Q1
Q2
Q3
The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger Q2 is the same as the median (50% are smaller, 50% are larger) Only 25% of the observations are greater than the third quartile
Quartile Formulas
Find a quartile by determining the value in the appropriate position in the ranked data, where
First quartile position: Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median position) Third quartile position: Q3 = 3(n+1)/4
Quartiles
Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9) Q1 is in the (9+1)/4 = 2.5 position of the ranked data so use the value half way between the 2nd and 3rd values, so Q1 = 12.5
Quartiles
(continued)
Example:
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9) Q1 is in the (9+1)/4 = 2.5 position of the ranked data, so Q1 = 12.5
The smallest measurement The first quartile, Q1 The median, Md The third quartile, Q3 The largest measurement Displayed visually using a box-andwhiskers plot
minimum
25%
maximum
12
30
45
57
70
Interquartile range = 57 30 = 27
Weighted Means
Sometimes, some measurements are more important than others
Assign numerical weights to the data
Weights measure relative importance of the value
w x w
i
i i
Continued
Want the mean unemployment rate for the U.S. Calculate it as a weighted mean
So that the bigger the region, the more heavily it counts in the mean
The data values are the regional unemployment rates The weights are the sizes of the regional labor forces
Continued
Note that the unweigthed mean is 4.55%, which underestimates the true rate by 0.03% That is, 0.0003 144.7 million = 43,410 workers
Descriptive Statistics for Grouped Data Data already categorized into a frequency distribution or a histogram is called grouped data Can calculate the mean and variance even when the raw data is not available Calculations are slightly different for data from a sample and data from a population
fX X= n
= 75.5/25=3.02
(n/2) m c Median = L + f
Where L =Lower limit of the median class n = Total number of observations = f m = Cumulative frequency preceding the median class f = Frequency of the median class c = Class interval of the median class
L+
(n/2) m c f
(25/ 2) 5 2+ 1 8
d1 = f1 f0
f1
f0
d2 = f1 f2
= Frequency of the modal class = Frequency preceding the modal class = Frequency succeeding the modal class C = Class Interval of the modal class
f2
d1 c Mode = L + d1 + d 2
L=2 d1 = f1 f0 = 8-4 = 4
d2 = f1 f 2 = 8-7 = 1
4 2 + 1 C = 1 Hence Mode = 5
= 2.8
f(X X )
fX n
fX
n
=1040/60=17.333(cell F10),
f(X X)
n 1
2448.33 59
= 6.44
Practice Problem
Q. 3.47 pp. 154, calculate sample mean, variance and s.d.
Age (Years) 28-32 33-37 38-42 43-47 48-52 53-57 58-62 63-67 68-72 73-77 Frequency 1 3 3 13 14 12 9 1 3 1
Solution
Midpoint 30 35 40 45 50 55 60 65 70 75 1 3 3 13 14 12 9 1 3 Freq 30 105 120 585 700 660 540 65 210 M*f (M mean)**2 462.25 272.25 132.25 42.25 2.25 12.25 72.25 182.25 342.25 F*diff^2 462.25 816.75 396.75 549.25 31.5 147 650.25 182.25 1026.75 552.25
s = 9.033835