PSYC 2F23: Stats in Research Design:

Chapter 1:

Pagano: How we know things?

1) Authority: Facts are considered true because of tradition or someone of distinction tells
you it is true.
2) Rationalism: Uses reasoning, if the premises are true, then the truth can be found
through logic.
- What if premises are faulty; conclusions are only as strong as the premises they’re built
on. Verify premises to verify conclusion.
3) Intuition: Sudden insight, “EUREKA”. Unreliable and unpredictable.
4) Scientific Method: Reasoning and intuition, but relies on objective assessments.
Steps in Scientific Method:
1) Observations (I noticed something)
2) Theory construction
3) Formulate testable hypotheses (if what I think is true then this should happen if I do
that) Must be falsifiable.
4) Empirical testing of hypothesis (data) (4 & 5 are what this course is).
5) Evaluation of empirical results (statistical analysis)
6) Re-evaluate theory.
Peer Review:
- Important? Scientists are good at self-policing; poke holes into theory.

- Population: complete set of individuals, objects or scores.
- Sample: A subset of the population.
- Variable: Property that can take on many values depending on the situation.
- Independent Variables: Variable that is manipulated by experimenter. Two groups,
experimental and controlled. Only difference is the independent variable. Difference
between the groups and conditions.
- Subject I.V: Uses existing differences (observational studies in the text). Sums up
difference between two groups. Ex: which hand is dominant.
- Experimental I.V: Manipulated by researcher, true experiments in text.
- Dependant Variable: Variable measured to determine if I.V. had an effect.
- Notes on Variables: IVs are different (manipulated, existing) (study), DVs are measured
- Statistic: Number based on sample data to quantify a characteristic of the sample.
- Parameter: A number based on population data to quantify a characteristic of the pop.
- Keep S’s and P’s together (pop, parameter, sample, statistic.
Random Sampling:
- We very seldom have the opportunity or time to study populations.
- We therefore rely on info gathered from samples.
- In order to use this information to predict something about the population the sample
must be carefully selected.
- Most fundamental method of obtaining a sample is random sampling
- Def: Every member of a population as an equal chance of being selected into the
Descriptive Vs Inferential Statistics:
- Descriptive statistics are used to describe your data
- Usually involves central tendency (typical score), shape of distribution, variability
(spread of scores)
- Inferential Statistics are used to infer something about the population based on the
sample data.
Chapter 2:
- Used to express populations
- One capital letter 
- Used from summation. ++
- X2 is different from (X)2 ;
- X2  Add square to every # from dataset
- (X)2  add square to whole parenthesis. ()^2
Measurement Scales:
1) Nominal (Categorical)
- Used to identify, name, or categorize data.
- Limited in math attributes; count, =, or =/
- Ex: Car make, gender, school major
2) Ordinal (Ranked)
- Rank/put scores in order
- <, =, >
- All att. Of nominal measurement
- Ex: Place in race. Dist. Between scores isn’t =
3) Interval
- All att of nominal and ordinal
- = units (intervals)
- Distance between points is same
- Statements like A-B = C or A-B>C-D
- Ex: Temp, iq, almost any questionnaire or test scores
- 2f23 simplified, in intervals on questionnaire or test, must have >25 options.
4) Ratio
- Like intervals except true zero (absolute absence of attribute, can be theoretical)
- Use of ratios (A/B).
- Ex: Height, Weight (theoretical)
- Ex: Money (logic)
5) Difference Between Interval and Ratio:
- True zero (can be theo)
- Basic rule: is 4 really twice as big as 2.
Figure 18.1 Decision flowchart for choosing the appropriate inference test.
- Measurement scale = dependant variable.

Continuous Variable: Takes infinite number of values

- Ex: height, weight
Discrete Variables: Set number of values.
- Number of people, goals, finish a race
Careful for continuous variables disguised as discrete:
- Often round continuous variables to the nearest convenient unit
- Round height up and weight down.
Real Limits:
- Because we round continuous variables we need to know what the limits are.
- Midpoint of any two adjacent points, ½ the smallest unit up and down.

Chapter 3:

Frequency Distributions:
- A frequency distribution presents the scores and their frequency of occurrence.
- Raw data is hard to interpret, disorganized.
Grouping Scores:
- Group Scores into Intervals
- Frequency Distribution of Grouped Scores.
- Loss of detail
Making a Frequency Distribution in Group Scores:
1) Determine range: -max score
2) Calculate the interval width (i)
3) List the limits of each class interval
4) Tally the raw scores inti appropriate class intervals
5) Determine frequency.

Graphing Frequency Distributions

- Two axes (horizontal abscissa), vertical (ordinate)
- Scores on horizontal
- Frequency of scores on vertical
- Appropriate scales
- Label axes and title graph.
- Nominal/Ordinal
- Categories or ranking on horizontal
- Bars don’t touch
- Indicates visually that scale is discrete.
- Interval or ratio
- Bars touch; continuous data
- Intervals on horizontal
- Edges of bars are real limits
- Midpoint of each interval plotted
- Height of each bar shows number of intervals
- Bars start on x axis as real limits of interval.
Frequency Polygon:
- Interval or ratio
- Same information as histograms
- Lines instead of bars
- Start interval below lowest interval
- Ends one interval above highest interval
Culminative Percentage Graph:
- Cum% column un graph forum
- Cum percent on vertical
- Intervals horizontal (plot point at upper real limit)
- Always increasing,
- Cum% curves can be used to estimate percentiles and percentile rank
- PR: start on the horizontal axis at the score, go up to the cum% curve and then across to
the percent of scores.
- Percentile: Start on vertical axis at particular %,go across to curve and down to find
Shape of Curves:
- Symetrical: Same on both sides, will fold into itself in ½
- Skewed: Not symmetrical
- Positive Skew: More low scores than high, ‘tail’ goes to right
- Negative Skew: More high than low scores tail goes off to left
Exploratory Data Analysis:
- One powerful way to make display data is stem and leaf diagram
- With a stem and leaf diagram we get a sense of the distribution.
- We have a frequency distribution and a histogram (turned sideways) without losing any
original information
Chapter 4 : Measures of central Tendencies:
- Mean: Sum of all scores / by number of scores =.
- Mean, median, and mode.
Properties of Mean:
1) Sensitive to exact value of all scores in the
2) Sum of deviation from the mean = 0.
3) Sensitive to extreme scores
4) Sum of the squared deviations of all scores from mean
are a minimum. This is another reason why we
typically use mean to represent/characterize a
sample, because any other number will deviate
further from the actual scores
5) Mean is least sensitive to sampling variation.
Median: Value below which 50% of scores fall. (P50), 50 th
percentile poin.
- Ungrouped scores: Ranking order the scores from the smallest to largest, then if there is
an odd number of scores, the mdn is the centermost score. If there’s an even number of
scores, mdn is the avg of two centermost scores.
Properties of Median:
1) Mdn is less sensitive to extreme scores.
2) More sensitive to sampling variability than mean
Properties of Mode:
1) Very sensitive to sampling variation
2) Not sensitive to extreme scores.
Which Central Tendency:
- Nominal: Mode
- Ordinal: Mode or median depending
- Interval & Ratio data: Mean or median is most useful, but mode is sometimes used.
Relationship Between Measures of Central Tendency and Symmetry of Distributions
- Unimodial & Symmetrical: Mean = Mode = Median
- Negative Skew: Mean < Median < Mode
- Positive Skew: Mode < Median < Mean
Sum of Squares:

Properties of Standard Deviation:

1) Measure of dispersion relative to mean
2) Sensitive to each score on distribution
3) Stable regarding to sampling fluctuations.
4) Can be manipulated mathematically.
Chapter 5: Normal Curve and Standard Scores:
Properties of Normal Distribution:
1) Symmetrical
2) Unimodal
3) Asymptotic to the horizontal axis.
4) Area under the curve based on the number of standard deviations from the mean is
constant for all normal distributions

Why is it Important:
1) Naturally occurring data are approx.. normal
2) Many statistical tests covered later use distribution
3) Many sampling distributions are close to normal, with increasing sample size.
- Sample Distribution: Every p[possible grouplet of individuals in a sample
Calculate Percentage Between Mean and Paticular Score
- Raw score  dead score.
Standard (Z) Scores:
- Z score is a transformed score that designates how many standard deviation units the
corresponding raw score is above or below the mean.
Column B:

What questions can we used for z scores :

1. Finding area beyond a particular raw score
2. Finding area below a particular raw score
3. Find percentile rank of a particular raw score
4. Finding area between two raw scores
5. Finding particular raw score(s) for a given area
6. Finding percentile point for a given percentage
7. Finding the actual number of cases below a particular raw score (or z score)

Converting to Z scores will standardize any distribution without regard to the original mean or
S.D. Once standardized, will always have a mean = 0 and SD = 1.

Internal Validity:
- An experiment/study has internal validity if changes to the DV are attributable to
changes in the IV.
Threats to Internal Validity:
1) Subject selection bias: assign randomly to fix.
2) Testing effect: Separate testing by time, use alternative versions of different tests, add
control group.
3) Instrumentation changes
4) Statistical regression: Extreme scores will return to the mean if tested again. Don’t take
extreme scores.
5) Subject maturation
6) Histories
7) Subject morality: Work harder, stay in contact.
8) Diffusion: Control and experimental group can communicate with each other. Keep
groups apart.
Chapter 6:
- First Pearson R: Measure of extent to which paired scores occupy the same or opposite
positions in their own distributions.
- 2nd Interpretation: Pearson R can also be understood in terms of the variability of one
variable (Y) accounted for by another (X).

