Professional Documents
Culture Documents
04 Variability Jan18
04 Variability Jan18
Chen)
Lecture 4
January 18, 2024
Variability
THESE SLIDES ARE PROVIDED AS A COURTESY AND STUDY AID FOR YOUR PERSONAL USE.
DO NOT REPOST OR REDISTRIBUTE ANY PART OF THESE SLIDES WITHOUT YOUR INSTRUCTOR’S PERMISSION.
2
Announcements
• Check Canvas under “Assignments” for
Assignment 1 (due in 1 week)
• You may work together, but write-ups must be
done on your own and in your own words
• Please do not post public questions on Piazza
seeking answers on the assignments. If in doubt,
send a private message on Piazza to your
instructors
– Use “search for teammates” to find study group
partners
3
Announcements
• “Policy on Use of AI Content Generators” has
been added to the syllabus (p. 5-6)
4
5
6
Today’s Topics
• Finish discussion of central tendency
• Variability
• Normal Curve
7
Measures of Central Tendency
8
Advantages of the mode:
– Easy to calculate and understand
– It is really your only option if you are
working with nominal data
– Useful when you care about the most
“popular” answer (e.g., what size of a
product do most people buy?)
9
Disadvantages of the mode:
– It ignores most of your data set
10 200
8 99
7 83
7 Mode = 7 65
7 in both cases 23
7 18
4 7
3 7
3 7
2 7 10
• Sum of all scores divided by the total number
of scores
(∑ X)
N
• The fulcrum, or balancing point, of all scores
11
Some important notation:
X=
∑ X
n
12
The Overall Mean
Useful when you want to combine data from
multiple groups of different sizes
Group 1 Group 2
6 3
4 2 18 + 8
X overall =
3 2 6+4
2 1
2 Sum = 8
1
Sum = 18
13
The Overall Mean
What if you have only summary data for each
group?
Group 1 Group 2
Mean = 3 Mean = 2
n=6 n=4
Since, for each group,
X=
∑ X
or… ∑ X = nX
n
You can still figure out the sum for
each group even if you only know
the mean and n 14
The Overall Mean
What if you have only summary data for each
group?
Group 1 Group 2
Mean = 3 Mean = 2
n=6 n=4
Since, for each group,
Sum = 6 x 3 Sum = 4 x 2
= 18 =8 X=
∑ X
or… ∑ X = nX
n
You can still figure out the sum for
(6)(3) + (4)(2)
X overall = each group even if you only know
6+4
the mean and n 15
The Overall Mean
(6)(3) + (4)(2)
X overall =
General formula: 6+4
18
Notice that the median is the same as the 50th
percentile point (P50). % scores lower
181 90%
172 80%
The fine print: since N=10 in this
example, the median location is 161 70%
actually halfway between the raw 130 60%
scores of 95 and 82. 95 50%
Pagano, p. 87
20
Central Tendency and Symmetry
21
Central Tendency and Skew
22
Which graph is labeled correctly?
A B
C D
A B
C D
25
Outliers
Possible solutions:
– Report the median
– Report the mean with the outliers removed
26
Summary of Central Tendency
Pros Cons
-easy to calculate -ignores most of the data
Mode -useful for nominal data -strongly affected by
sampling variation
-less affected by extreme -somewhat affected by
scores (outliers) than the sampling variation
Median mean
27
Summary of Central Tendency
Pros Cons
-- easy to calculate -- ignores most of the data
Mode -- useful for nominal data -- strongly affected by
sampling variation
-less affected by extreme -somewhat affected by
scores (outliers) than the sampling variation
Median mean
28
Summary of Central Tendency
Pros Cons
-- easy to calculate -- ignores most of the data
Mode -- useful for nominal data -- strongly affected by
sampling variation
-- less affected by extreme -- somewhat affected by
scores (outliers) than the sampling variation
Median mean
29
Summary of Central Tendency
Pros Cons
-- easy to calculate -- ignores most of the data
Mode -- useful for nominal data -- strongly affected by
sampling variation
-- less affected by extreme -- somewhat affected by
scores (outliers) than the sampling variation
Median mean
30
Today’s Topics
• Finish discussion of central tendency
• Variability
• Normal Curve
31
Average Mood vs. Moodiness
32
No indication of variability…
33
Typical ranges are indicated
34
Variability
If you only have a measure of central tendency
to describe a data set, you won’t be able to
distinguish between:
35
The range is an easy-to-calculate measure of
variability.
Range = highest score – lowest score
37
Deviation
Another way we could compare the variability in
the two datasets is to describe the total amount
of deviation of raw scores from the mean.
How can we
express deviation
mathematically?
38
Deviation = distance between a raw score and the mean
39
Sample mean:
X=
∑ X 450
= = 50
N 9
X=
∑ X 450
= = 50
N 9
Deviation from
Raw Score (X) mean ( X − X )
Amy 10 -40
Brian 20 -30
Christine 30 -20
Debbie 40 -10
Enid 50 0
Fred 60 10
Gloria 70 20
Harriet 80 30
Ivan 90 40
41
Sum of deviations = 0! Not very helpful….
Sample mean:
X=
∑ X 450
= = 50
N 9
Deviation from Squared Deviation
Raw Score (X) mean ( X − X ) ( X − X )2
Amy 10 -40 1600
Brian 20 -30 900
Christine 30 -20 400
Debbie 40 -10 100
Enid 50 0 0
Fred 60 10 100
Gloria 70 20 400
Harriet 80 30 900
Ivan 90 40 1600
Sum of squares (SS) = 6000 42
What is the formula for sum of squares (SS)?
2
A. ∑X − X
B. ( ∑ X − X) 2
C. ∑(X − X) 2
D. 2
∑(X − X )
43
What is the formula for sum of squares (SS)?
2
A. ∑X − X
B. ( ∑ X − X) 2
44
An alternative formula for SS is:
(∑ X) 2
∑X 2
−
N
*See textbook Tables 4.8 and 4.9 (on pages 92-93) to see an example of SS
calculated with both of these formulas
45
Variance
Dividing SS by the sample size N will give us the
variance.
46
Variance
“sigma” (lower case)
SS
2
For a population: σ =
N
For a sample: 2 SS
s =
N −1
47
Variance
SS 2
For a population: σ =
N Dividing the sample
variance by N-1, rather
than N, will lead to a
For a sample: 2 SS better estimate of the
s = population variance
N −1
2 SS
s= s =
N −1
51
Recommended Homework
Problems at the end of textbook Ch. 4 (p. 96-99)
– 1-15, 21, 22, 26-29, 33-34
– For extra practice: 30-32, 35-40