BPSY 198

FIRST SEMESTER | A.Y. 2022-2023

A STATISTICS REFRESHER _  For many psychological qualities, it is extremely difficult, if not

Test scores are frequently expressed as numbers, and statistical impossible, to define an absolute 0 point.
tools are used to describe, make inferences from, and draw  For example, if one measures shyness on a scale from 0 through
conclusions about numbers. 10, then it is hard to define what it means for a person to have
absolutely no shyness.
Why We Need Statistics?
1. Statistics are used for purposes of description. Numbers Scales of Measurement
provide convenient summaries and allow us to evaluate some
observations relative to others (Maul, Irribarra, & Wilson, 2016). For
example, if you get a score of 54 on a psychology examination, you
probably want to know what the 54 means.
2. We can use statistics to make inferences, which are logical
deductions about events that cannot be observed directly. For
example, you do not know how many people watched a particular
television movie unless you ask everyone. However, by using
scientific sample surveys, you can infer the percentage of people
who saw the film.
Nominal Scales
 Nominal scales are the simplest form of measurement. As a
Descriptive statistics
matter of fact, nominal scales are really not scales at all; their only
 Descriptive statistics are methods used to provide a concise
purpose is to name objects. For example, the numbers on the
description of a collection of quantitative information.
backs of football players’ uniforms are nominal. Nominal scales are
used when the information is qualitative rather than
Inferential statistics quantitative.
 Inferential statistics are methods used to make inferences from  For example, in the specialty area of clinical psychology, a nominal
observations of a small group of people known as a sample to a scale in use for many years is the Diagnostic and Statistical Manual
larger group of individuals known as a population. of Mental Disorders.
 Typically, the psychologist wants to make statements about the  Individual test items may also employ nominal scaling, including
larger group but cannot possibly make all the necessary yes\no responses. For example, consider the following test items:
observations. Instead, he or she observes a relatively small group of
subjects (sample) and uses inferential statistics to estimate the
characteristics of the larger group (Schweder & Hjort, 2016).

 We may formally define measurement as the act of assigning
numbers or symbols to characteristics of things (people, events,
whatever) according to rules.  In each case, a yes or no response results in the placement into
one of a set of mutually exclusive groups: suicidal or not, under care
Scale for psychiatric disorder or not, and felon or not.
Ordinal Scales
 Scale is a set of numbers (or other symbols) whose properties
 A scale with the property of magnitude but not equal intervals
model empirical properties of the objects which the numbers
or an absolute 0 is an ordinal scale.
are assigned.
 This scale allows you to rank individuals or objects but not to
 There are various ways in which a scale can be categorized. One
way of categorizing scale is according to the type of variable say anything about the meaning of the differences between the
being measured. Thus, a scale used to measure a continuous ranks.
variable as continuous scale, whereas a scale used to measure a  In business and organizational settings, job applicants may be
discrete variable might be referred to as a discrete scale. ranked-ordered according to their desirability for a position. In
clinical settings, people on a waiting list for psychotherapy may be
rank-ordered according to their need for treatment.
Error  Ordinal scales imply nothing about how much greater one ranking
 Measurement always involves error. In the language of is than another. Even though ordinal scales may employ
assessment, error refers to the collective influence of all of the numbers or “scores” to represent the rank ordering, the
factors on a test score or measurement beyond those numbers do not indicate units of measurement.
specifically measured by the test or measurement.  Ordinal scales have no absolute zero point. In the case of a test
of job performance ability, every testtaker, regardless of standing on
Properties of Scales the test, is presumed to have some ability. No testtaker is presumed
Magnitude to have zero ability.
 Magnitude is the property of “moreness.” Interval Scales
 A scale has the property of magnitude if we can say that a  When a scale has the properties of magnitude and equal
particular instance of the attribute represents more, less, or intervals but not absolute 0, we refer to it as an interval scale.
equal amounts of the given quantity than does another  With interval scales, we have reached a level of measurement at
instance. which it is possible to average a set of measurements and obtain a
 On a scale of height, for example, if we can say that John is taller meaningful result.
than Fred, then the scale has the property of magnitude.  The most common example of an interval scale is the
Equal Intervals measurement of temperature in degrees Fahrenheit. The Celsius
 A scale has the property of equal intervals if the difference scale of temperature is also an interval rather than a ratio scale.
between two points at any place on the scale has the same Although 0 represents freezing on the Celsius scale, it is not an
meaning as the difference between two other points that differ by absolute 0.
the same number of scale units.  Scores on many tests, such as tests of intelligence, are analyzed
 For example, the difference between inch 2 and inch 4 on a ruler statistically in ways appropriate for data at the interval level of
represents the same quantity as the difference between inch 10 and measurement.
inch 12: exactly 2 inches. Ratio Scales
 When a scale has the property of equal intervals, the relationship  A scale that has all three properties (magnitude, equal intervals,
between the measured units and some outcome can be described and an absolute 0) is called a ratio scale.
by a straight line or a linear equation.  For example, consider the number of yards gained by running
Absolute 0 backs on football teams. Zero yards actually means that the player
 An absolute 0 is obtained when nothing of the property being has gained no yards at all.
measured exists.

BPSY 198
FIRST SEMESTER | A.Y. 2022-2023

Measurement in Psychology Bar Graph

 The ordinal level of measurement is most frequently used in  In a bar graph, numbers indicative of frequency also appear on
psychology. the Y-axis, and categorization appears on the X-axis.
 As Kerlinger (1973) put it, “Intelligence, aptitude, and
personality test scores are, basically and strictly speaking,
ordinal. These tests indicate with more or less accuracy not the
amount of intelligence, aptitude, and personality traits of individuals,
but rather the rank-ordered positions of the individuals.” Kerlinger
allowed that “most psychological and educational scales
approximate interval equality fairly well.” Though he cautioned
that if ordinal measurements are treated as if they were interval
measurements, then the test user must “be constantly alert to the
possibility of gross inequality of intervals.”
 Why would psychologists want to treat their assessment data as
interval when those data would be better described as ordinal? This
is because the attraction of interval measurement for users of Frequency Polygon
psychological tests is the flexibility with which such data can be  A graph expressed by a continuous line connecting the points
manipulated statistically. where test scores or class intervals meet frequencies.

Describing Data _
Raw Score
 A raw score is a straightforward, unmodified accounting of
performance that is usually numerical.
 A raw score may reflect a simple tally, as in number of items
responded correctly on an achievement tests.
 Raw scores can be converted into other types of scores.
 Distribution may be defined as a set of test scores arrayed for
recording or study.
Measures of Central Tendency
Frequency Distribution  A statistic that indicates the average or midmost score between
 The data from the test could be organized into a distribution of the the scores in a distribution.
raw scores. One way the scores could be distributed is by the  The center of a distribution can be defined in different ways.
frequency with which they occur. Perhaps the most commonly used measure of central tendency
 In a frequency distribution, all scores are listed alongside the is the arithmetic mean (or, more simply, mean), which is referred
number of times each score occurred. The scores might be listed to in everyday language as the “average.” The mean takes into
in tabular or graphic form. account the actual numerical value of every score.
 In special instances, such as when there are only a few scores
and one or two of the scores are extreme in relation to the remaining
ones, a measure of central tendency other than the mean may be
desirable. Other measures of central tendency are median and
the mode.
 Note that, in the formulas to follow, the standard statistical
shorthand called “summation notation” (summation meaning “the
 Often, a frequency distribution is referred to as a simple sum of”) is used. The Greek uppercase letter sigma, Σ, is the
frequency distribution to indicate that individual scores have symbol used to signify “sum”; if X represents a test score, then
been used and the data have not been grouped. the expression Σ X means “add all the test scores.”
 Another kind of frequency distribution used to summarize data is The mean
grouped frequency distribution. In a grouped frequency  The arithmetic average score in a distribution is called the
distribution, test score intervals, also called class intervals, replace mean.
the actual test scores.  To calculate the mean, we total the scores and divide the sum
by the number of cases, or N. The capital Greek letter sigma (∑)
means summation. Thus, the formula for the mean, which we signify
as X is

Graph  The arithmetic mean is typically the most appropriate measure of

 Frequency distributions of test scores can also be illustrated central tendency for interval or ratio data when the distributions
graphically. A graph is a diagram or chart composed of lines, are believed to be approximately normal.
points, bars, or other symbols that describe and illustrate data.  An arithmetic mean can also be computed from a frequency
Histogram distribution. The formula for doing this is X = Σ(fX)/n, where Σ( f X)
 A graph with vertical lines drawn at the true limits of each test means “multiply the frequency of each score by its corresponding
score (or class interval), forming a series of contiguous rectangles. score and then sum.”
The median
 The median, defined as the middle score in a distribution, is
another commonly used measure of central tendency.
 We determine the median of a distribution of scores by ordering
the scores in a list by magnitude, in either ascending or
descending order.
 If the total number of scores ordered is an odd number, then the
median will be the score that is exactly in the middle, with one-
half of the remaining scores lying above it and the other half of the
remaining scores lying below it.
BPSY 198
FIRST SEMESTER | A.Y. 2022-2023

 When the total number of scores ordered is an even number, then obtained by subtracting the mean from the score (X − mean = x).
the median can be calculated by determining the arithmetic The bars on each side of x indicate that it is the absolute value of the
mean of the two middle scores. deviation score (ignoring the positive or negative sign and treating all
The mode deviation scores as positive).
 The most frequently occurring score in a distribution of  All the deviation scores are then summed and divided by the total
scores is the mode. number of scores (n) to arrive at the average deviation.
 Distributions that contain a tie for the designation “most frequently  The average deviation is rarely used. Perhaps this is so
occurring score” can have more than one mode. because the deletion of algebraic signs renders it a useless
 Consider the following scores—arranged in no particular order— measure for purposes of any further operations.
obtained by 20 students on the final exam of a new trade school The Standard Deviation
called the Home Study School of Elvis Presley Impersonators:  We may define the standard deviation as a measure of variability
equal to the square root of the average squared deviations
51 49 51 50 66 52 53 38 17 66 about the mean. More succinctly, it is equal to the square root of
33 44 73 13 21 91 87 92 47 3 the variance.
 The variance is equal to the arithmetic mean of the squares of the
 These scores are said to have a bimodal distribution because
differences between the scores in a distribution and their mean. The
there are two scores (51 and 66) that occur with the highest
formula used to calculate the variance (s2) using deviation scores is
frequency (of two). Except with nominal data, the mode tends not to
be a very commonly used measure of central tendency.

Measures of Variability
 Variability is an indication of how scores in a distribution are
scattered or dispersed.  Where (X 2 X) is the deviation of a score from the mean. The
 Statistics that describe the amount of variation in a distribution are symbol s is the lowercase Greek sigma; s2 is used as a standard
referred to as measures of variability. Some measures of variability description of the variance. Simply stated, the variance is
include the range, the interquartile range, the semi-interquartile calculated by squaring and summing all the deviation scores
range, the average deviation, the standard deviation, and the and then dividing by the total number of scores.
variance.  Though the variance is a useful statistic commonly used in
Range data analysis, it shows the variable in squared deviations around
 The range of a distribution is equal to the difference between the the mean rather than in deviations around the mean. In other words,
highest and the lowest scores. the variance is the average squared deviation around the mean. To
 The range is the simplest measure of variability to calculate, get it back into the units that will make sense to us, we need to take
but its potential use is limited. Because the range is based entirely the square root of the variance. The square root of the variance is
on the values of the lowest and highest scores, one extreme score the standard deviation (s), and it is represented by the following
(if it happens to be the lowest or the highest) can radically alter the formula:
value of the range.
The interquartile and semi-interquartile ranges
 A distribution of test scores (or any other data, for that matter) can
be divided into four parts such that 25% of the test scores
occur in each quarter.
 As shown in the picture below, the dividing points between the
four quarters in the distribution are the quartiles. There are three
of them, respectively labeled Q1, Q2, and Q3. Note that quartile
refers to a specific point whereas quarter refers to an interval.  The standard deviation is thus the square root of the average
squared deviation around the mean.
 Although the standard deviation is not an average deviation, it
gives a useful approximation of how much a typical score is
above or below the average score.
 The standard deviation is a very useful measure of variation
because each individual score’s distance from the mean of the
distribution is factored into its computation.


 The interquartile range is a measure of variability equal to the

difference between Q3 and Q1. Like the median, it is an ordinal
 A related measure of variability is the semi-interquartile range,  Distributions can be characterized by their skewness, or the
which is equal to the interquartile range divided by 2. nature and extent to which symmetry is absent.
 Knowledge of the relative distances of Q1 and Q3 from Q2 (the  Skewness is an indication of how the measurements in a
median) provides the seasoned test interpreter with immediate distribution are distributed.
information as to the shape of the distribution of scores.  A distribution has a positive skew when relatively few of the
 In a perfectly symmetrical distribution, Q1 and Q3 will be scores fall at the high end of the distribution. Positively skewed
exactly the same distance from the median. If these distances examination results may indicate that the test was too difficult.
are unequal then there is a lack of symmetry. This lack of symmetry More items that were easier would have been desirable in order to
is referred to as skewness. better discriminate at the lower end of the distribution of test scores.
The average deviation  A distribution has a negative skew when relatively few of the
 Another tool that could be used to describe the amount of scores fall at the low end of the distribution. Negatively skewed
variability in a distribution is the average deviation, or AD for examination results may indicate that the test was too easy. In
short. this case, more items of a higher level of difficulty would make it
 Its formula is AD = ∑[x]/n. The lowercase italic x in the formula possible to better discriminate between scores at the upper end of
signifies a score’s deviation from the mean. The value of x is the distribution.
BPSY 198
FIRST SEMESTER | A.Y. 2022-2023

Kurtosis  The graph tells us that 99.74% of all scores in these normally
distributed spelling-test data lie between ±3 standard deviations.
 Stated another way, 99.74% of all spelling test scores lie between
5 and 95.
 This graph also illustrates the following characteristics of all normal
■ 50% of the scores occur above the mean and 50% of the scores
occur below the mean.
■ Approximately 34% of all scores occur between the mean and 1
standard deviation above the mean.
■ Approximately 34% of all scores occur between the mean and 1
standard deviation below the mean.
 The term testing professionals use to refer to the steepness of a ■ Approximately 68% of all scores occur between the mean and ±1
distribution in its center is kurtosis. standard deviation.
 To the root kurtic is added to one of the prefixes platy-, lepto-, or ■ Approximately 95% of all scores occur between the mean and ±2
meso- to describe the peakedness/flatness of three general standard deviations.
types of curves.  A normal curve has two tails. The area on the normal curve
 Distributions are generally described as platykurtic (relatively flat), between 2 and 3 standard deviations above the mean is referred to
leptokurtic (relatively peaked), or—somewhere in the middle— as a tail. The area between −2 and −3 standard deviations below the
mesokurtic. mean is also referred to as a tail.
 Distributions that have high kurtosis are characterized by a high
peak and “fatter” tails compared to a normal distribution. Standard Scores
 In contrast, lower kurtosis values indicate a distribution with a  Simply stated, a standard score is a raw score that has been
rounded peak and thinner tails. converted from one scale to another scale, where the latter scale
has some arbitrarily set mean and standard deviation.
The Normal Curve  Raw scores may be converted to standard scores because
standard scores are more easily interpretable than raw scores.
With a standard score, the position of a testtaker’s performance
relative to other testtakers is readily apparent.
 Different systems for standard scores exist, each unique in terms
of its respective mean and standard deviations.
 First for consideration is the type of standard score scale that may
be thought of as the zero plus or minus one scale. This is so
because it has a mean set at 0 and a standard deviation set at 1.
 Raw scores converted into standard scores on this scale are more
popularly referred to as z scores.
 Standard scores converted from raw scores may involve either
 Development of the concept of a normal curve began in the linear or nonlinear transformations.
middle of the eighteenth century with the work of Abraham Linear Transformation
DeMoivre and, later, the Marquis de Laplace.  A standard score obtained by a linear transformation is one that
 At the beginning of the nineteenth century, Karl Friedrich Gauss retains a direct numerical relationship to the original raw score.
made some substantial contributions. Through the early  The magnitude of differences between such standard scores
nineteenth century, scientists referred to it as the “Laplace-Gaussian exactly parallels the differences between corresponding raw scores.
curve.” Karl Pearson is credited with being the first to refer to Sometimes scores may undergo more than one transformation.
the curve as the normal curve, perhaps in an effort to be Nonlinear Transformation
diplomatic to all of the people who helped develop it.  A nonlinear transformation may be required when the data under
 Theoretically, the normal curve is a bell-shaped, smooth, consideration are not normally distributed yet comparisons
mathematically defined curve that is highest at its center. From with normal distributions need to be made.
the center it tapers on both sides approaching the X-axis  In a nonlinear transformation, the resulting standard score does
asymptotically (meaning that it approaches, but never touches, the not necessarily have a direct numerical relationship to the original,
axis). raw score. As the result of a nonlinear transformation, the original
distribution is said to have been normalized.
The Area under the Normal Curve Normalized Standard Scores
 The normal curve can be conveniently divided into areas  Many test developers hope that the test they are working on will
defined in units of standard deviation. yield a normal distribution of scores. Yet even after very large
 A hypothetical distribution of National Spelling Test scores with a samples have been tested with the instrument under development,
mean of 50 and a standard deviation of 15 is illustrated in the picture skewed distributions result.
below. In this example, a score equal to 1 standard deviation above  One alternative available to the test developer is to normalize the
the mean would be equal to 65 (X + 1s = 50 + 15 = 65). distribution. Conceptually, normalizing a distribution involves
“stretching” the skewed curve into the shape of a normal curve and
creating a corresponding scale of standard scores, a scale that is
technically referred to as a normalized standard score scale.

Z Scores
 One problem with means and standard deviations is that they do
not convey enough information for us to make meaningful
assessments or accurate interpretations of data. Other metrics
are designed for more exact interpretations.
 The Z score transforms data into standardized units that are
easier to interpret. A Z score is the difference between a score and
the mean, divided by the standard deviation:

BPSY 198
FIRST SEMESTER | A.Y. 2022-2023

 In other words, a Z score is the deviation of a score Xi from the  If two variables simultaneously increase or simultaneously
mean in standard deviation units. If a score is equal to the mean, decrease, then those two variables are said to be positively (or
then its Z score is 0. directly) correlated.
 A Z score results from the conversion of a raw score into a  A negative (or inverse) correlation occurs when one variable
number indicating how many standard deviation units the raw increases while the other variable decreases.
score is below or above the mean of the distribution.  Correlation is often confused with causation. It must be
 In essence, a z score is equal to the difference between a emphasized that a correlation coefficient is merely an index of
particular raw score and the mean divided by the standard the relationship between two variables, not an index of the
deviation. causal relationship between two variables. Although correlation
does not imply causation, there is an implication of prediction.
T Scores
 If the scale used in the computation of z scores is called a zero The Pearson r
plus or minus one scale, then the scale used in the computation of T
scores can be called a fifty plus or minus ten scale; that is, a
scale with a mean set at 50 and a standard deviation set at 10.
 Devised by W. A. McCall (1922, 1939) and named a T score in
honor of his professor E. L. Thorndike, this standard score
system is composed of a scale that ranges from 5 standard  Many techniques have been devised to measure correlation. The
deviations below the mean to 5 standard deviations above the most widely used of all is the Pearson r, also known as the
mean. Pearson correlation coefficient and the Pearson product-
 Thus, for example, a raw score that fell exactly at 5 standard moment coefficient of correlation.
deviations below the mean would be equal to a T score of 0, a raw  Devised by Karl Pearson, r can be the statistical tool of choice
score that fell at the mean would be equal to a T of 50, and a raw when the relationship between the variables is linear and when the
score 5 standard deviations above the mean would be equal to a T two variables being correlated are continuous (or, they can
of 100. theoretically take any value).
 One advantage in using T scores is that none of the scores is  Other correlational techniques can be employed with data that are
negative. By contrast, in a z score distribution, scores can be discontinuous and where the relationship is nonlinear.
positive and negative; this can make further computation
cumbersome in some instances.
Coefficient of determination
 The value obtained for the coefficient of correlation can be further
Other Standard Scores interpreted by deriving from it what is called a coefficient of
 Numerous other standard scoring systems exist. Researchers determination, or r2.
during World War II developed a standard score with a mean of 5  The coefficient of determination is an indication of how much
and a standard deviation of approximately 2. Divided into nine variance is shared by the X- and the Y-variables. The calculation
units, the scale was christened a stanine, a term that was a of r2 is quite straightforward. Simply square the correlation coefficient
contraction of the words standard and nine. and multiply by 100; the result is equal to the percentage of the
 Another type of standard score is employed on tests such as the variance accounted for.
Scholastic Aptitude Test (SAT) and the Graduate Record
Examination (GRE). Raw scores on those tests are converted to
standard scores such that the resulting distribution has a mean of The Spearman Rho
500 and a standard deviation of 100.  One commonly used alternative statistic is variously called a rank-
order correlation coefficient, a rank-difference correlation
coefficient, or simply Spearman’s rho.
Correlation and Inference _
 Developed by Charles Spearman, a British psychologist, this
 Central to psychological testing and assessment are inferences coefficient of correlation is frequently used when the sample size
(deduced conclusions) about how some things (such as traits, is small (fewer than 30 pairs of measurements) and especially when
abilities, or interests) are related to other things (such as both sets of measurements are in ordinal (or rank-order) form.
behavior).  Spearman’s rho is a method of correlation for finding the
 A coefficient of correlation (or correlation coefficient) is a number association between two sets of ranks.
that provides us with an index of the strength of the relationship  The rho coefficient (r) is easy to calculate and is often used when
between two things. the individuals in a sample can be ranked on two variables but
 An understanding of the concept of correlation and an ability to their actual scores are not known or have a normal distribution.
compute a coefficient of correlation is therefore central to the study
of tests and measurement.
Graphic Representations of Correlation
The Concept of Correlation
 Simply stated, correlation is an expression of the degree and
direction of correspondence between two things.
 A coefficient of correlation (r) expresses a linear relationship
between two (and only two) variables, usually continuous in
nature. It reflects the degree of concomitant variation between
variable X and variable Y.
 The coefficient of correlation is the numerical index that
expresses this relationship: It tells us the extent to which X and
Y are “co-related.”
 The meaning of a correlation coefficient is interpreted by its sign
and magnitude. If a person were asked about its sign, It would
answer “plus” (for a positive correlation), “minus” (for a
negative correlation), or “none” (in the rare instance that the
correlation coefficient was exactly equal to zero). If asked to
supply information about its magnitude, it would respond with a
number anywhere at all between −1 and +1.
 The intriguing fact about the magnitude of a correlation coefficient:
It is judged by its absolute value. This means that to the extent that
we are impressed by correlation coefficients, a correlation of −.99 is
 One type of graphic representation of correlation is referred to by
every bit as impressive as a correlation of +.99.
many names, including a bivariate distribution, a scatter diagram,
a scattergram, or—our favorite—a scatterplot.

BPSY 198
FIRST SEMESTER | A.Y. 2022-2023

 A scatterplot is a simple graphing of the coordinate points for

values of the X-variable (placed along the graph’s horizontal axis)
and the Y-variable (placed along the graph’s vertical axis).
 Scatterplots are useful because they provide a quick indication
of the direction and magnitude of the relationship, if any,
between the two variables.
 Scatterplots are useful in revealing the presence of
curvilinearity in a relationship.
 Curvilinearity in this context refers to an “eyeball gauge” of how
curved a graph is.
 Remember that a Pearson r should be used only if the
relationship between the variables is linear. If the graph does not
appear to take the form of a straight line, the chances are good that
the relationship is not linear. When the relationship is nonlinear,
other statistical tools and techniques may be employed.

Scatterplot (Positive)

Scatterplot (Curvilinear and with Outlier)

Scatterplot (Restricted vs. Unrestricted Range)

Scatterplot (Negative)  Generally, the best estimate of the correlation between two
variables is most likely to come not from a single study alone but
from analysis of the data from several studies.
 Meta-analysis may be defined as a family of techniques used to
statistically combine information across studies to produce
single estimates of the data under study. The estimates derived,
referred to as effect size, may take several different forms.

Effect Size
 An estimate of the strength of the relationship (or the size of the
differences) between groups.

