Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

PSYC 218 006 (Dr.

Chen)
Lecture 4
January 18, 2024
Variability

THESE SLIDES ARE PROVIDED AS A COURTESY AND STUDY AID FOR YOUR PERSONAL USE.
DO NOT REPOST OR REDISTRIBUTE ANY PART OF THESE SLIDES WITHOUT YOUR INSTRUCTOR’S PERMISSION.
2
Announcements
• Check Canvas under “Assignments” for
Assignment 1 (due in 1 week)
• You may work together, but write-ups must be
done on your own and in your own words
• Please do not post public questions on Piazza
seeking answers on the assignments. If in doubt,
send a private message on Piazza to your
instructors
– Use “search for teammates” to find study group
partners

3
Announcements
• “Policy on Use of AI Content Generators” has
been added to the syllabus (p. 5-6)

4
5
6
Today’s Topics
• Finish discussion of central tendency
• Variability
• Normal Curve

7
Measures of Central Tendency

8
Advantages of the mode:
– Easy to calculate and understand
– It is really your only option if you are
working with nominal data
– Useful when you care about the most
“popular” answer (e.g., what size of a
product do most people buy?)

9
Disadvantages of the mode:
– It ignores most of your data set
10 200
8 99
7 83
7 Mode = 7 65
7 in both cases 23
7 18
4 7
3 7
3 7
2 7 10
• Sum of all scores divided by the total number
of scores
(∑ X)
N
• The fulcrum, or balancing point, of all scores

11
Some important notation:

X (“X bar”) = mean of a sample

µ (“mew”) = mean of a population

X=
∑ X
n
12
The Overall Mean
Useful when you want to combine data from
multiple groups of different sizes

Group 1 Group 2

6 3
4 2 18 + 8
X overall =
3 2 6+4
2 1
2 Sum = 8
1

Sum = 18
13
The Overall Mean
What if you have only summary data for each
group?

Group 1 Group 2

Mean = 3 Mean = 2
n=6 n=4
Since, for each group,

X=
∑ X
or… ∑ X = nX
n
You can still figure out the sum for
each group even if you only know
the mean and n 14
The Overall Mean
What if you have only summary data for each
group?

Group 1 Group 2

Mean = 3 Mean = 2
n=6 n=4
Since, for each group,
Sum = 6 x 3 Sum = 4 x 2
= 18 =8 X=
∑ X
or… ∑ X = nX
n
You can still figure out the sum for
(6)(3) + (4)(2)
X overall = each group even if you only know
6+4
the mean and n 15
The Overall Mean
(6)(3) + (4)(2)
X overall =
General formula: 6+4

Note that this formula gives you the same answer as


adding up all the raw scores and dividing by the total N

*More practice in your textbook, p.83-85*


16
• The score that is exactly in the middle (half of
the other scores are higher, and half are
lower)
• When there are an even number of scores, the
two centermost scores are averaged to get the
median.
7
7 6
6 5
Median = 4.5
4 Median = 4 4
2 2
2 2 17
For raw data tables where the numbers are in
order, you can find the location of the median
using the formula:
(N+1)/2
7
The location of the median
6
is the 3rd score from the
4
top (or bottom). The value
2
of the median is 4.
2

18
Notice that the median is the same as the 50th
percentile point (P50). % scores lower
181 90%
172 80%
The fine print: since N=10 in this
example, the median location is 161 70%
actually halfway between the raw 130 60%
scores of 95 and 82. 95 50%

For this class, if I ask you “what is 82 40%


the median in this sample?” I will 60 30%
accept either 95 as the answer 20%
55
(since that is the 50th percentile
point), or 88.5 (since that is the 43 10%
average of 95 and 82). 29 0% (1%)
19
vs.
The median is less sensitive to extreme scores
(outliers) than the mean.

Pagano, p. 87

20
Central Tendency and Symmetry

21
Central Tendency and Skew

The mean gets “pulled


into” the tail more than
the median

22
Which graph is labeled correctly?

A B

Mode Mean Mean Mode


Median Median

C D

Mean Median Median Mode


Mode Mean 23
Which graph is labeled correctly?

A B

Mode Mean Mean Mode


Median Median

C D

Mean Median Median Mode


Mode Mean 24
Outliers
• Outliers are highly atypical scores
• Reporting the mean of a sample can be
misleading when you have large outliers
– Example: mean net worth in a sample that
happened to include a billionaire

25
Outliers
Possible solutions:
– Report the median
– Report the mean with the outliers removed

This strategy is common when the


source of the outlier is suspected to
be measurement error

26
Summary of Central Tendency
Pros Cons
-easy to calculate -ignores most of the data
Mode -useful for nominal data -strongly affected by
sampling variation
-less affected by extreme -somewhat affected by
scores (outliers) than the sampling variation
Median mean

-least affected by sampling -most affected by extreme


Mean variation scores and skew

27
Summary of Central Tendency
Pros Cons
-- easy to calculate -- ignores most of the data
Mode -- useful for nominal data -- strongly affected by
sampling variation
-less affected by extreme -somewhat affected by
scores (outliers) than the sampling variation
Median mean

-least affected by sampling -most affected by extreme


Mean variation scores and skew

28
Summary of Central Tendency
Pros Cons
-- easy to calculate -- ignores most of the data
Mode -- useful for nominal data -- strongly affected by
sampling variation
-- less affected by extreme -- somewhat affected by
scores (outliers) than the sampling variation
Median mean

-least affected by sampling -most affected by extreme


Mean variation scores and skew

29
Summary of Central Tendency
Pros Cons
-- easy to calculate -- ignores most of the data
Mode -- useful for nominal data -- strongly affected by
sampling variation
-- less affected by extreme -- somewhat affected by
scores (outliers) than the sampling variation
Median mean

-- least affected by sampling -- most affected by extreme


Mean variation scores and skew

30
Today’s Topics
• Finish discussion of central tendency
• Variability
• Normal Curve

31
Average Mood vs. Moodiness

Central Tendency Variability

Variability can give us more


information than the average alone.

32
No indication of variability…

33
Typical ranges are indicated

34
Variability
If you only have a measure of central tendency
to describe a data set, you won’t be able to
distinguish between:

Data set A and data set B have equal means,


medians, and modes.

However, data set B has a greater


range than data set A. The data in
B vary more.

35
The range is an easy-to-calculate measure of
variability.
Range = highest score – lowest score

However, the range still doesn’t allow us to


distinguish between:
Data set A and data set B have equal means,
medians, and modes. And equal ranges, too!

However, the scores in data set A


are more spread out (they vary
more) than the scores in data set B.

What else can we use to compare


variability?
36
Deviation
On average, the scores in data set A are further
away (or “deviate” more) from the mean than
the scores in data set B.

37
Deviation
Another way we could compare the variability in
the two datasets is to describe the total amount
of deviation of raw scores from the mean.

How can we
express deviation
mathematically?

38
Deviation = distance between a raw score and the mean

For a population: deviation = X − µ

For a sample: deviation = X − X

39
Sample mean:

X=
∑ X 450
= = 50
N 9

Raw Score (X)


Amy 10
Brian 20
Christine 30
Debbie 40
Enid 50
Fred 60
Gloria 70
Harriet 80
Ivan 90
40
Sample mean:

X=
∑ X 450
= = 50
N 9
Deviation from
Raw Score (X) mean ( X − X )
Amy 10 -40
Brian 20 -30
Christine 30 -20
Debbie 40 -10
Enid 50 0
Fred 60 10
Gloria 70 20
Harriet 80 30
Ivan 90 40
41
Sum of deviations = 0! Not very helpful….
Sample mean:

X=
∑ X 450
= = 50
N 9
Deviation from Squared Deviation
Raw Score (X) mean ( X − X ) ( X − X )2
Amy 10 -40 1600
Brian 20 -30 900
Christine 30 -20 400
Debbie 40 -10 100
Enid 50 0 0
Fred 60 10 100
Gloria 70 20 400
Harriet 80 30 900
Ivan 90 40 1600
Sum of squares (SS) = 6000 42
What is the formula for sum of squares (SS)?
2
A. ∑X − X
B. ( ∑ X − X) 2

C. ∑(X − X) 2

D. 2
∑(X − X )

43
What is the formula for sum of squares (SS)?
2
A. ∑X − X
B. ( ∑ X − X) 2

This one means:


C. ∑(X − X) 2 1. For each raw score, find the
deviation (distance from the mean)
2. Square each deviation
3. Add up all the squared deviations
D. 2
∑(X − X )

44
An alternative formula for SS is:

(∑ X) 2

∑X 2

N

*See textbook Tables 4.8 and 4.9 (on pages 92-93) to see an example of SS
calculated with both of these formulas

45
Variance
Dividing SS by the sample size N will give us the
variance.

Variance expresses the “average” squared


deviation from the mean.

46
Variance
“sigma” (lower case)

SS
2
For a population: σ =
N

For a sample: 2 SS
s =
N −1

47
Variance

SS 2
For a population: σ =
N Dividing the sample
variance by N-1, rather
than N, will lead to a
For a sample: 2 SS better estimate of the
s = population variance
N −1

For more info (completely optional) see:


https://stats.stackexchange.com/questions/3931/intuitive-explanation-for-dividing-by-n-1-when-calculating-standard-deviation
https://www.youtube.com/watch?v=wpY9o_OyxoQ
48
Deviation from Squared Deviation
Raw Score (X) mean ( X − X ) ( X − X )2
Amy 10 -40 1600
Brian 20 -30 900
Christine 30 -20 400
Debbie 40 -10 100
Enid 50 0 0
Fred 60 10 100
Gloria 70 20 400
Harriet 80 30 900
Ivan 90 40 1600

N=9 SS 6000 = 750 “squared” points


s2 = =
SS = 6000 N −1 9 −1
49
Variance gives us squared units of
measurement, which can be inconvenient.

If we “unsquare” our variance, we’ll get back to


our original units.

2 SS
s= s =
N −1

…this is the standard deviation.


50
Today’s Topics
• Finish discussion of central tendency
• Variability
• Normal Curve …will be covered in the next lecture!

51
Recommended Homework
Problems at the end of textbook Ch. 4 (p. 96-99)
– 1-15, 21, 22, 26-29, 33-34
– For extra practice: 30-32, 35-40

Don’t forget to check the “Corrections to typos in


the textbook” document in Canvas under
“Modules—Textbook Resources”

You might also like