Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods

Descriptive Statistics: Numerical Methods
Descriptive Statistics
3.1 3.2 3.3 3.5 Describing Central Tendency Measures of Variation Percentiles, Quartiles Grouped Data
Describing Central Tendency

In addition to describing the shape of a distribution, want to describe the data sets central tendency
A measure of central tendency represents the center or middle of the data
Parameters and Statistics

A population parameter is a number calculated from all the population measurements that describes some aspect of the population A sample statistic is a number calculated using the sample measurements that describes some aspect of the sample
Measures of Central Tendency

Mean, Median, Md The average or expected value The value of the middle point of the ordered measurements The most frequent value
Mode, Mo
The Mean
Population X1, X2, , XN Sample x1, x2, , xn
Population Mean
N
Sample Mean
n
i=1
Xi
x=
x
i=1
The Sample Mean

For a sample of size n, the sample mean is defined as
n
x=
x
i =1
x1 + x2 + ... + xn = n
and is a point estimate of the population mean

It is the value to expect, on average and in the long run
Example 3.1: The Car Mileage Case

Example 3.1: Sample mean for first five car mileages from Table 3.1: 30.8, 31.7, 30.1, 31.6, 32.1
5
x1 + x2 + x3 + x4 + x5 5 5 30.8 + 31.7 + 30.1 + 31.6 + 32.1 156.3 x= = = 31.26 5 5 x=

i =1 i
The Median
The median Md is a value such that 50% of all measurements, after having been arranged in numerical order, lie above (or below) it
1. If the number of measurements is odd, the median is the middlemost measurement in the ordering 2. If the number of measurements is even, the median is the average of the two middlemost measurements in the ordering
Example: Car Mileage Case

Example 3.1: First five observations from Table 3.1: 30.8, 31.7, 30.1, 31.6, 32.1 In order: 30.1, 30.8, 31.6, 31.7, 32.1 There is an odd so median is one in middle, or 31.6
The Mode
The mode Mo of a population or sample of measurements is the measurement that occurs most frequently
Modes are the values that are observed most typically Sometimes higher frequencies at two or more values
If there are two modes, the data is bimodal If more than two modes, the data is multimodal
When data are in classes, the class with the highest frequency is the modal class
The tallest box in the histogram
Histogram Describing the 50 Mileages
Relationships Among Mean, Median and Mode
Measures of Variation
Knowing the measures of central tendency is not enough Both of the distributions below have identical measures of central tendency
Measures of Variation
Range Variance Largest minus the smallest measurement The average of the squared deviations of all the population measurements from the population mean The square root of the variance
Standard Deviation
The Range
Largest minus smallest Measures the interval spanned by all the data For Figure 3.13, largest repair time is 5 and smallest is 3 Range is 5 3 = 2 days
Population Variance and Standard Deviation

The population variance (2) is the average of the squared deviations of the individual population measurements from the population mean () The population standard deviation () is the positive square root of the population variance
Variance
For a population of size N, the population variance 2 is:

2 =
2 ( ) x i i =1 N 2 2 2 ( x1 ) + ( x2 ) + L + (x N ) =
For a sample of size n, the sample variance s2 is:

s2 =
2 ( ) x x i i =1 n 2 2 2 ( x1 x ) + ( x2 x ) + L + ( xn x ) =
n 1
n 1
Standard Deviation
Population standard deviation ():
Sample standard deviation (s):
s= s
Example: Chriss Class Sizes This Semester

Data points are: 60, 41, 15, 30, 34 Mean is 36 Variance is:
2 2 2 2 2 ( 60 36 ) + (41 36 ) + (15 36 ) + (30 36 ) + (34 36) = 2
5 576 + 25 + 441 + 36 + 4 1082 = = = 216.4 5 5
Standard deviation is:

= 216.4 = 14.71
Example: Sample Variance and Standard Deviation

Example 3.7: data for first five car mileages from Table 3.1 are 30.8, 31.7, 30.1, 31.6, 32.1 The sample mean is 31.26
s2 =
(x x )
i =1 i
5 1 2 2 2 2 2 ( 30.8 31.26) + (31.7 31.26) + (30.1 31.26) + (31.6 31.26) + (32.1 31.26) = 4 2.572 = = 0.643 4
s = s 2 = 0.643 = 0.8019
The Empirical Rule for Normal Populations

If a population has mean and standard deviation and is described by a normal curve, then
68.26% of the population measurements lie within one standard deviation of the mean: [-, +] 68.26% of the population measurements lie within two standard deviations of the mean: [-2, +2] 68.26% of the population measurements lie within three standard deviations of the mean: [-3, +3]
The Empirical Rule and Tolerance Intervals
Example 3.9: The Car Mileage Case

Continued
68.26% of all individual cars will have mileages in the range [xs] = [31.60.8] = [30.8, 32.4] mpg 95.44% of all individual cars will have mileages in the range [x2s] = [31.61.6] = [30.0, 33.2] mpg 99.73% of all individual cars will have mileages in the range [x3s] = [31.62.4] = [29.2, 34.0] mpg
Chebyshevs Theorem
Let and be a populations mean and standard deviation, then for any value k> 1 At least 100(1 - 1/k2 )% of the population measurements lie in the interval [-k, +k]
Coefficient of Variation
Measures the size of the standard deviation relative to the size of the mean Coefficient of variation =standard deviation/mean 100% Used to:
Compare the relative variabilities of values about the mean Compare the relative variability of populations or samples with different means and different standard deviations Measure risk
Percentiles, Quartiles, and BoxBox-and andWhiskers Displays

For a set of measurements arranged in increasing order, the pth percentile is a value such that p percent of the measurements fall at or below the value and (100-p) percent of the measurements fall at or above the value The first quartile Q1 is the 25th percentile The second quartile (or median) is the 50th percentile The third quartile Q3 is the 75th percentile The interquartile range IQR is Q3 - Q1
Quartiles
Quartiles split the ranked data into 4 segments with an equal number of values per segment
25% 25% 25% 25%
Q1
Q2
Q3
The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger Q2 is the same as the median (50% are smaller, 50% are larger) Only 25% of the observations are greater than the third quartile
Quartile Formulas
Find a quartile by determining the value in the appropriate position in the ranked data, where
First quartile position: Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median position) Third quartile position: Q3 = 3(n+1)/4
where n is the number of observed values
Quartiles
Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9) Q1 is in the (9+1)/4 = 2.5 position of the ranked data so use the value half way between the 2nd and 3rd values, so Q1 = 12.5
Q1 and Q3 are measures of noncentral location Q2 = median, a measure of central tendency
Quartiles
(continued)
Example:
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9) Q1 is in the (9+1)/4 = 2.5 position of the ranked data, so Q1 = 12.5
Q2 is in the (9+1)/2 = 5th position of the ranked data, so Q2 = median = 16
Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data, so Q3 = 19.5
Five Number Summary

1. 2. 3. 4. 5.
The smallest measurement The first quartile, Q1 The median, Md The third quartile, Q3 The largest measurement Displayed visually using a box-andwhiskers plot
Five Number Summary

Example: X Q1 25% Median (Q2) 25% Q3 25% X
minimum
25%
maximum
12
30
45
57
70
Interquartile range = 57 30 = 27
Weighted Means
Sometimes, some measurements are more important than others
Assign numerical weights to the data
Weights measure relative importance of the value
Calculate weighted mean as
w x w
i
i i
where wi is the weight assigned to the ith measurement xi
Example: Weighted Mean

June 2001 unemployment rates by census region
Northeast, 26.9 million in civilian labor force, 4.1% unemployment rate South, 50.6 million, 4.7% unemployment Midwest, 34.7 million, 4.4% unemployment West, 32.5 million, 5.0 unemployment
Want the mean unemployment rate for the US
Continued
Want the mean unemployment rate for the U.S. Calculate it as a weighted mean
So that the bigger the region, the more heavily it counts in the mean
The data values are the regional unemployment rates The weights are the sizes of the regional labor forces

663.29 = = 4.58% 144.7
Continued
( 26.9 4.1) + (50.6 4.7 ) + (34.7 4.4 ) + (32.5 5.0 ) =

26.9 + 50.6 + 34.7 + 25.5 + 32.5
Note that the unweigthed mean is 4.55%, which underestimates the true rate by 0.03% That is, 0.0003 144.7 million = 43,410 workers
Descriptive Statistics for Grouped Data Data already categorized into a frequency distribution or a histogram is called grouped data Can calculate the mean and variance even when the raw data is not available Calculations are slightly different for data from a sample and data from a population
Mean for Grouped Data Example

Find the arithmetic mean for the following continuous frequency distribution: Class Frequency 0-1 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2
Solution for the Example

1 2 3 4 5 6 7 8 9 A Class 0-1 1-2 2-3 3-4 4-5 5-6 Totals Mean B X 0.5 1.5 2.5 3.5 4.5 5.5 C f 1 4 8 7 3 2 25 D fX 0.5 6.0 20.0 24.5 13.5 11.0 75.5 3.02
Applying the formula
fX X= n
= 75.5/25=3.02
Median for Grouped Data

Formula for Median is given by
(n/2) m c Median = L + f
Where L =Lower limit of the median class n = Total number of observations = f m = Cumulative frequency preceding the median class f = Frequency of the median class c = Class interval of the median class
Median for Grouped Data Example

Find the median for the following continuous frequency distribution: Class Frequency 0-1 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2

Class Frequency 0-1 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2 Total 25 Substituting in the formula the relevant values, Median = = 2.9375 Cumulative Frequency 1 5 13 20 23 25
L+
(n/2) m c f
,we have Median =
(25/ 2) 5 2+ 1 8
Mode for Grouped Data

d1 c Mode = L + d1 + d 2
Where L =Lower limit of the modal class
d1 = f1 f0
f1
f0
d2 = f1 f2
= Frequency of the modal class = Frequency preceding the modal class = Frequency succeeding the modal class C = Class Interval of the modal class
f2
Mode for Grouped Data Example

Example: Find the mode for the following continuous frequency distribution: Class Frequency 0-1 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2

Class Frequency 0-1 1 1-2 4 2-3 8 3-4 7 4-5 3 5-6 2 Total 25
d1 c Mode = L + d1 + d 2
L=2 d1 = f1 f0 = 8-4 = 4
d2 = f1 f 2 = 8-7 = 1
4 2 + 1 C = 1 Hence Mode = 5
= 2.8
Standard Deviation for Grouped Data

The standard deviation for sample data, based on frequency distribution is given by
f(X X )
S= which is used to estimate the Population n 1 Standard Deviation . Here

X =
fX n
n is the Sample Size =
, X =Mid Point of each class
Standard Deviation for Grouped DataData-Example

Frequency Distribution of Return on Investment of Mutual Funds Return on Investment 5-10 10-15 15-20 20-25 25-30 Total Number of Mutual Funds 10 12 16 14 8 60

From the spreadsheet of Microsoft Excel in the previous slide, it is easy to see Mean = X =
fX
n
=1040/60=17.333(cell F10),
Standard Deviation = S = (Cell H12)
f(X X)
n 1
2448.33 59
= 6.44
Practice Problem
Q. 3.47 pp. 154, calculate sample mean, variance and s.d.
Age (Years) 28-32 33-37 38-42 43-47 48-52 53-57 58-62 63-67 68-72 73-77 Frequency 1 3 3 13 14 12 9 1 3 1
Solution
Midpoint 30 35 40 45 50 55 60 65 70 75 1 3 3 13 14 12 9 1 3 Freq 30 105 120 585 700 660 540 65 210 M*f (M mean)**2 462.25 272.25 132.25 42.25 2.25 12.25 72.25 182.25 342.25 F*diff^2 462.25 816.75 396.75 549.25 31.5 147 650.25 182.25 1026.75 552.25
s = 9.033835
1 75 552.25 variance = 81.61017 sample mean = 51.5

Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Descriptive Statistics: Numerical Descriptive Statistics: Numerical Methods Methods Methods Methods

Uploaded by

Copyright:

Available Formats

Descriptive Statistics: Numerical Methods

Describing Central Tendency

Parameters and Statistics

Measures of Central Tendency

The Sample Mean

and is a point estimate of the population mean

Example 3.1: The Car Mileage Case

x1 + x2 + x3 + x4 + x5 5 5 30.8 + 31.7 + 30.1 + 31.6 + 32.1 156.3 x= = = 31.26 5 5 x=

Example: Car Mileage Case

Histogram Describing the 50 Mileages

Relationships Among Mean, Median and Mode

Population Variance and Standard Deviation

For a population of size N, the population variance 2 is:

For a sample of size n, the sample variance s2 is:

Sample standard deviation (s):

Example: Chriss Class Sizes This Semester

5 576 + 25 + 441 + 36 + 4 1082 = = = 216.4 5 5

Standard deviation is:

Example: Sample Variance and Standard Deviation

The Empirical Rule for Normal Populations

The Empirical Rule and Tolerance Intervals

Example 3.9: The Car Mileage Case

Percentiles, Quartiles, and BoxBox-and andWhiskers Displays

where n is the number of observed values

Q1 and Q3 are measures of noncentral location Q2 = median, a measure of central tendency

Q2 is in the (9+1)/2 = 5th position of the ranked data, so Q2 = median = 16

Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data, so Q3 = 19.5

Five Number Summary

Five Number Summary

Calculate weighted mean as

where wi is the weight assigned to the ith measurement xi

Example: Weighted Mean

Want the mean unemployment rate for the US

Example: Weighted Mean

Example: Weighted Mean

( 26.9 4.1) + (50.6 4.7 ) + (34.7 4.4 ) + (32.5 5.0 ) =

Mean for Grouped Data Example

Solution for the Example

Applying the formula

Median for Grouped Data

Median for Grouped Data Example

Solution for the Example

,we have Median =

Mode for Grouped Data

Mode for Grouped Data Example

Solution for the Example

Standard Deviation for Grouped Data

S= which is used to estimate the Population n 1 Standard Deviation . Here

n is the Sample Size =

, X =Mid Point of each class

Standard Deviation for Grouped DataData-Example

Solution for the Example

Solution for the Example

Standard Deviation = S = (Cell H12)

1 75 552.25 variance = 81.61017 sample mean = 51.5

You might also like