04 Variability Jan18

PSYC 218 006 (Dr.
Chen)
Lecture 4
January 18, 2024
Variability
THESE SLIDES ARE PROVIDED AS A COURTESY AND STUDY AID FOR YOUR PERSONAL USE.
DO NOT REPOST OR REDISTRIBUTE ANY PART OF THESE SLIDES WITHOUT YOUR INSTRUCTOR’S PERMISSION.
2
Announcements
• Check Canvas under “Assignments” for
Assignment 1 (due in 1 week)
• You may work together, but write-ups must be
done on your own and in your own words
• Please do not post public questions on Piazza
seeking answers on the assignments. If in doubt,
send a private message on Piazza to your
instructors
– Use “search for teammates” to find study group
partners
3
Announcements
• “Policy on Use of AI Content Generators” has
been added to the syllabus (p. 5-6)
4
5
6
Today’s Topics
• Finish discussion of central tendency
• Variability
• Normal Curve
7
Measures of Central Tendency
8
Advantages of the mode:
– Easy to calculate and understand
– It is really your only option if you are
working with nominal data
– Useful when you care about the most
“popular” answer (e.g., what size of a
product do most people buy?)
9
Disadvantages of the mode:
– It ignores most of your data set
10 200
8 99
7 83
7 Mode = 7 65
7 in both cases 23
7 18
4 7
3 7
3 7
2 7 10
• Sum of all scores divided by the total number
of scores
(∑ X)
N
• The fulcrum, or balancing point, of all scores
11
Some important notation:
X (“X bar”) = mean of a sample
µ (“mew”) = mean of a population
X=
∑ X
n
12
The Overall Mean
Useful when you want to combine data from
multiple groups of different sizes
Group 1 Group 2
6 3
4 2 18 + 8
X overall =
3 2 6+4
2 1
2 Sum = 8
1
Sum = 18
13
The Overall Mean
What if you have only summary data for each
group?
Group 1 Group 2
Mean = 3 Mean = 2
n=6 n=4
Since, for each group,
X=
∑ X
or… ∑ X = nX
n
You can still figure out the sum for
each group even if you only know
the mean and n 14
The Overall Mean
What if you have only summary data for each
group?
Group 1 Group 2
Mean = 3 Mean = 2
n=6 n=4
Since, for each group,
Sum = 6 x 3 Sum = 4 x 2
= 18 =8 X=
∑ X
or… ∑ X = nX
n
You can still figure out the sum for
(6)(3) + (4)(2)
X overall = each group even if you only know
6+4
the mean and n 15
The Overall Mean
(6)(3) + (4)(2)
X overall =
General formula: 6+4
Note that this formula gives you the same answer as

adding up all the raw scores and dividing by the total N
*More practice in your textbook, p.83-85*

16
• The score that is exactly in the middle (half of
the other scores are higher, and half are
lower)
• When there are an even number of scores, the
two centermost scores are averaged to get the
median.
7
7 6
6 5
Median = 4.5
4 Median = 4 4
2 2
2 2 17
For raw data tables where the numbers are in
order, you can find the location of the median
using the formula:
(N+1)/2
7
The location of the median
6
is the 3rd score from the
4
top (or bottom). The value
2
of the median is 4.
2
18
Notice that the median is the same as the 50th
percentile point (P50). % scores lower
181 90%
172 80%
The fine print: since N=10 in this
example, the median location is 161 70%
actually halfway between the raw 130 60%
scores of 95 and 82. 95 50%
For this class, if I ask you “what is 82 40%

the median in this sample?” I will 60 30%
accept either 95 as the answer 20%
55
(since that is the 50th percentile
point), or 88.5 (since that is the 43 10%
average of 95 and 82). 29 0% (1%)
19
vs.
The median is less sensitive to extreme scores
(outliers) than the mean.
Pagano, p. 87
20
Central Tendency and Symmetry
21
Central Tendency and Skew
The mean gets “pulled

into” the tail more than
the median
22
Which graph is labeled correctly?
A B
Mode Mean Mean Mode

Median Median
C D
Mean Median Median Mode

Mode Mean 23
Which graph is labeled correctly?
A B
Mode Mean Mean Mode

Median Median
C D
Mean Median Median Mode

Mode Mean 24
Outliers
• Outliers are highly atypical scores
• Reporting the mean of a sample can be
misleading when you have large outliers
– Example: mean net worth in a sample that
happened to include a billionaire
25
Outliers
Possible solutions:
– Report the median
– Report the mean with the outliers removed
This strategy is common when the

source of the outlier is suspected to
be measurement error
26
Summary of Central Tendency
Pros Cons
-easy to calculate -ignores most of the data
Mode -useful for nominal data -strongly affected by
sampling variation
-less affected by extreme -somewhat affected by
scores (outliers) than the sampling variation
Median mean
-least affected by sampling -most affected by extreme

Mean variation scores and skew
27
Pros Cons
-- easy to calculate -- ignores most of the data
Mode -- useful for nominal data -- strongly affected by
sampling variation
-less affected by extreme -somewhat affected by
Median mean

28
Pros Cons
sampling variation
-- less affected by extreme -- somewhat affected by
Median mean

29
Pros Cons
sampling variation
-- less affected by extreme -- somewhat affected by
Median mean
-- least affected by sampling -- most affected by extreme

30
Today’s Topics
• Variability
• Normal Curve
31
Average Mood vs. Moodiness
Central Tendency Variability
Variability can give us more

information than the average alone.
32
No indication of variability…
33
Typical ranges are indicated
34
Variability
If you only have a measure of central tendency
to describe a data set, you won’t be able to
distinguish between:
Data set A and data set B have equal means,

medians, and modes.
However, data set B has a greater

range than data set A. The data in
B vary more.
35
The range is an easy-to-calculate measure of
variability.
Range = highest score – lowest score
However, the range still doesn’t allow us to

distinguish between:
Data set A and data set B have equal means,
medians, and modes. And equal ranges, too!
However, the scores in data set A

are more spread out (they vary
more) than the scores in data set B.
What else can we use to compare

variability?
36
Deviation
On average, the scores in data set A are further
away (or “deviate” more) from the mean than
the scores in data set B.
37
Deviation
Another way we could compare the variability in
the two datasets is to describe the total amount
of deviation of raw scores from the mean.
How can we
express deviation
mathematically?
38
Deviation = distance between a raw score and the mean
For a population: deviation = X − µ
For a sample: deviation = X − X
39
Sample mean:
X=
∑ X 450
= = 50
N 9
Raw Score (X)

Amy 10
Brian 20
Christine 30
Debbie 40
Enid 50
Fred 60
Gloria 70
Harriet 80
Ivan 90
40
Sample mean:
X=
∑ X 450
= = 50
N 9
Deviation from
Raw Score (X) mean ( X − X )
Amy 10 -40
Brian 20 -30
Christine 30 -20
Debbie 40 -10
Enid 50 0
Fred 60 10
Gloria 70 20
Harriet 80 30
Ivan 90 40
41
Sum of deviations = 0! Not very helpful….
Sample mean:
X=
∑ X 450
= = 50
N 9
Deviation from Squared Deviation
Raw Score (X) mean ( X − X ) ( X − X )2
Amy 10 -40 1600
Brian 20 -30 900
Christine 30 -20 400
Debbie 40 -10 100
Enid 50 0 0
Fred 60 10 100
Gloria 70 20 400
Harriet 80 30 900
Ivan 90 40 1600
Sum of squares (SS) = 6000 42
What is the formula for sum of squares (SS)?
2
A. ∑X − X
B. ( ∑ X − X) 2
C. ∑(X − X) 2
D. 2
∑(X − X )
43
What is the formula for sum of squares (SS)?
2
A. ∑X − X
B. ( ∑ X − X) 2
This one means:

C. ∑(X − X) 2 1. For each raw score, find the
deviation (distance from the mean)
2. Square each deviation
3. Add up all the squared deviations
D. 2
∑(X − X )
44
An alternative formula for SS is:
(∑ X) 2
∑X 2
−
N
*See textbook Tables 4.8 and 4.9 (on pages 92-93) to see an example of SS
calculated with both of these formulas
45
Variance
Dividing SS by the sample size N will give us the
variance.
Variance expresses the “average” squared

deviation from the mean.
46
Variance
“sigma” (lower case)
SS
2
For a population: σ =
N
For a sample: 2 SS
s =
N −1
47
Variance
SS 2
For a population: σ =
N Dividing the sample
variance by N-1, rather
than N, will lead to a
For a sample: 2 SS better estimate of the
s = population variance
N −1
For more info (completely optional) see:

https://stats.stackexchange.com/questions/3931/intuitive-explanation-for-dividing-by-n-1-when-calculating-standard-deviation
https://www.youtube.com/watch?v=wpY9o_OyxoQ
48
Deviation from Squared Deviation
Raw Score (X) mean ( X − X ) ( X − X )2
Amy 10 -40 1600
Brian 20 -30 900
Christine 30 -20 400
Debbie 40 -10 100
Enid 50 0 0
Fred 60 10 100
Gloria 70 20 400
Harriet 80 30 900
Ivan 90 40 1600
N=9 SS 6000 = 750 “squared” points

s2 = =
SS = 6000 N −1 9 −1
49
Variance gives us squared units of
measurement, which can be inconvenient.
If we “unsquare” our variance, we’ll get back to

our original units.
2 SS
s= s =
N −1
…this is the standard deviation.

50
Today’s Topics
• Variability
• Normal Curve …will be covered in the next lecture!
51
Recommended Homework
Problems at the end of textbook Ch. 4 (p. 96-99)
– 1-15, 21, 22, 26-29, 33-34
– For extra practice: 30-32, 35-40
Don’t forget to check the “Corrections to typos in

the textbook” document in Canvas under
“Modules—Textbook Resources”

04 Variability Jan18

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

04 Variability Jan18

Uploaded by

Copyright:

Available Formats

PSYC 218 006 (Dr.

X (“X bar”) = mean of a sample

µ (“mew”) = mean of a population

Note that this formula gives you the same answer as

*More practice in your textbook, p.83-85*

For this class, if I ask you “what is 82 40%

The mean gets “pulled

Mode Mean Mean Mode

Mean Median Median Mode

Mode Mean Mean Mode

Mean Median Median Mode

This strategy is common when the

-least affected by sampling -most affected by extreme

-least affected by sampling -most affected by extreme

-least affected by sampling -most affected by extreme

-- least affected by sampling -- most affected by extreme

Central Tendency Variability

Variability can give us more

Data set A and data set B have equal means,

However, data set B has a greater

However, the range still doesn’t allow us to

However, the scores in data set A

What else can we use to compare

For a population: deviation = X − µ

For a sample: deviation = X − X

Raw Score (X)

This one means:

Variance expresses the “average” squared

For more info (completely optional) see:

N=9 SS 6000 = 750 “squared” points

If we “unsquare” our variance, we’ll get back to

…this is the standard deviation.

Don’t forget to check the “Corrections to typos in

You might also like

More practice in your textbook, p.83-85