Professional Documents
Culture Documents
Annotated 3 Ch3 Data Description F2014
Annotated 3 Ch3 Data Description F2014
Annotated 3 Ch3 Data Description F2014
Name
Empirical Distribution: The pattern followed by the observed data (the actual pattern)
Example: Roll 1 six sided die 60 times and record the number on the side that lands
face up.
Variable of interest: # on face-up side
Observed data pattern: six 1s, eight 2s, twelve 3s, ten 4s, fourteen 5s, ten 6s.
Dot Diagram
1. Order data (smallest to largest)
2. Label x-axis with range of data values and y-axis with count of values at each distinct
point.
3. Add one dot to the plot for each data value, stacking duplicate values vertically.
Example 1
The government requires manufacturers to monitor the amount of radiation emitted through
the closed door of a microwave. The following are radiation amounts emitted by 24 microwaves measured by one manufacturer.
.01
.10
.18
.08
.05
.20
.05
.10
.30
.11
.20
.15
.02
.01
.12
.09
.08
.05
.03
.09
.10
.02
.07
.10
Stem-and-Leaf Plots
1. Order data values
2. Select one or more leading digits for the stem values; last digit becomes the leaf
3. List possible stem values in a vertical column
4. Draw a vertical line to the right of the stem
5. Add leaf values, in order, on the other side of the vertical line
Example 1 (part 2)
It is easiest to order your data before any analysis.
.01
.08
.18
.01
.09
.20
.02
.09
.20
.02
.10
.30
.03
.05 .05
.10 .10
.10
.05
.11
.07
.12
.08
.15
1122355578899
00001258
00
0
11223
55578899
000012
58
00
0
3. Data set 1: 3.5 4.2 4.6 4.6 5.0 5.1 6.4 6.8
Data set 2: 5.2 5.5 5.7 5.7 5.8 5.9 6.2 7.2
Frequency Tables
Use intervals of equal length
Number of intervals varies, a matter of judgment
Every endpoint of the intervals is in exactly one interval (ie: no overlapping)
Example 1 (part 3)
0.01-0.05
8
0.06-0.10
9
0.11-0.15
3
0.16-0.20
3
0.21-0.25
0
0.26-0.30
1
Example 1:
Frequency
Relative
Frequency
Cumulative
Relative
Frequency
0.01-0.05
8
0.06-0.10
9
0.11-0.15
3
0.16-0.20
3
0.21-0.25
0
0.26-0.30
1
Sum
24
.333
.375
.125
.125
.042
1.00
.333
.708
.833
.958
.958
1.00
Histogram
A plot of frequency or relative frequency
How to make a histogram:
Use intervals of equal length
Show entire vertical axis beginning at zero and avoid breaking either axis
Keep a uniform scale across a given axis
Center bars of appropriate heights at the midpoints of the intervals
Example 1
Uniform
Right-Skewed
Left-Skewed
Bimodal
Truncated (J-shaped)
Quantiles
Definition: for any number 0 p 1, the p quantile is the number, denoted as Q(p),
such that p is the percentage of the distribution that lies to the left of (below) Q(p),
and 1 p is the percentage of the distribution that lies to the right of Q(p)
For an ordered data set x1 x2 xn
For i = 1, 2, . . . , n the p =
xi . That is:
i.5
n
i .5
n
= xi
Example 2
Annual incomes (in thousands of dollars) for 8 families (in a common geographical location)
are given below:
23, 31, 43, 47, 51, 58, 67, 103
Which quantiles are exactly observations from this data set?
Example 2 (part 2)
23, 31, 43, 47, 51, 58, 67, 103
1. What data value corresponds to the .25 quantile?
Quartile
Special quantiles:
Q(.25): Q1 , 1st quartile, lower quartile
Q(.5): Q2 , 2nd quartile, median
Q(.75): Q3 , 3rd quartile, upper quartile
Special values associated with quartiles:
Inter-quartile range (IQR): Q3 Q1
Upper fence: Q3 + 1.5IQR
Lower fence: Q1 1.5IQR
Boxplot
Steps for making a boxplot: (with ordered data)
0. Draw your scale.
1. Draw a vertical lines at Q1 , Q2 , Q3 and connect with a box.
2. Compute IQR = Q3 Q1 , Upper and Lower fences
Upper Fence = Q3 + 1.5IQR
Lower Fence = Q1 1.5IQR
3. Draw asterisks (or dots) for any data values less than the lower fence and any values
greater than the upper fence; these we will define as outliers.
4. Draw a line from the sides of the box to the smallest value greater than the LF and
the largest value smaller than the UF.
Example 2 (part 3)
Make a boxplot
23, 31, 43, 47, 51, 58, 67, 103
1. First we need quartiles.
(a) (From above) Q1 =
(b) (From above) Q3 =
(c) Q2 =
10
Example 3
Ten batteries were tested to determine how long the batteries would last (hrs) under normal
conditions. Below are the ten values that were obtained:
100, 120, 80, 90, 95, 115, 120, 110, 105, 95
1. Calculate Q(.35)
2. Calculate Q(.42)
3. Calculate Q(.90)
11
Side-by-Side Boxplots
Side-by-side boxplots can be used to compare two data sets Make sure they are set on the
same scale to make a comparison possible.
Example 4a (n1 = n2 )
Data Set 1: 1, 2, 3, 4, 5
Data Set 2: 6, 7, 8, 9, 10
12
Example 4b (n1 6= n2 )
Data Set 1: 1, 2, 3, 4, 5
Data Set 2: 6, 7, 8, 9, 10, 11
Example 4c
Data set 1: 1, 5, 7, 8, 9, 10
Data set 2: -10, -9, -8, -7, -5, -1
13
Median (Location)
Same as Q(0.5); ie: gives center value of the data set.
Not affected by a few extreme or outlying observations.
Example:
2, 3, 5, 8, 12
2, 3, 5, 8, 100
Q(.5) = 5
Q(.5) = 5
Mean (Location)
For x1 , x2 , . . . , xn the mean is given by
n
x
=
1X
xi
n
i=1
Also called first moment or center of mass Strongly affected by a few extreme or
outlying observations.
Example:
2, 3, 5, 8, 12
2, 3, 5, 8, 100
x
=6
x
= 23.6
Mode (Location)
The most frequently occurring data point
Can also be used for qualitative data
Can have multiple modes
Not affected by outliers (so to speak)
Example:
2, 3, 5, 5, 5, 8, 8, 12
2, 3, 5, 5, 5, 8, 8, 8, 100
mode = 5
modes = 5,8
14
1 X
s =
(xi xn )2
n1
2
i=1
-orn
X
n
X
2
s =
x2i
!2
xi
i=1
i=1
n1
Gives a measure of how much the data is spread from the sample mean. Larger values
of s2 indicate more spread.
Average (squared) distance from the mean.
Example 5
Calculate the mean and standard deviation for the data below.
4, 8, 2, 14, 7, 12
15
Recalculate
the standard
using summary statistics.
Pn deviation
Pn
2 = 473
x
x
=
47,
and
i=1 i
i=1 i
Sample variance, s2
Parameter: a numerical summary of population data. (More on this in chapter 5.)
Population mean,
Population variance, 2
p =
Graphical tools
Bar chart: like a histogram without intervals
Pie chart
Dot diagram
See 3.4 for more details
16