Professional Documents
Culture Documents
Why We Need Statistics? Scales of Measurement: FIRST SEMESTER - A.Y. 2022-2023 A Statistics Refresher
Why We Need Statistics? Scales of Measurement: FIRST SEMESTER - A.Y. 2022-2023 A Statistics Refresher
Measurement
We may formally define measurement as the act of assigning
numbers or symbols to characteristics of things (people, events,
whatever) according to rules. In each case, a yes or no response results in the placement into
one of a set of mutually exclusive groups: suicidal or not, under care
Scale for psychiatric disorder or not, and felon or not.
Ordinal Scales
Scale is a set of numbers (or other symbols) whose properties
A scale with the property of magnitude but not equal intervals
model empirical properties of the objects which the numbers
or an absolute 0 is an ordinal scale.
are assigned.
This scale allows you to rank individuals or objects but not to
There are various ways in which a scale can be categorized. One
way of categorizing scale is according to the type of variable say anything about the meaning of the differences between the
being measured. Thus, a scale used to measure a continuous ranks.
variable as continuous scale, whereas a scale used to measure a In business and organizational settings, job applicants may be
discrete variable might be referred to as a discrete scale. ranked-ordered according to their desirability for a position. In
clinical settings, people on a waiting list for psychotherapy may be
rank-ordered according to their need for treatment.
Error Ordinal scales imply nothing about how much greater one ranking
Measurement always involves error. In the language of is than another. Even though ordinal scales may employ
assessment, error refers to the collective influence of all of the numbers or “scores” to represent the rank ordering, the
factors on a test score or measurement beyond those numbers do not indicate units of measurement.
specifically measured by the test or measurement. Ordinal scales have no absolute zero point. In the case of a test
of job performance ability, every testtaker, regardless of standing on
Properties of Scales the test, is presumed to have some ability. No testtaker is presumed
Magnitude to have zero ability.
Magnitude is the property of “moreness.” Interval Scales
A scale has the property of magnitude if we can say that a When a scale has the properties of magnitude and equal
particular instance of the attribute represents more, less, or intervals but not absolute 0, we refer to it as an interval scale.
equal amounts of the given quantity than does another With interval scales, we have reached a level of measurement at
instance. which it is possible to average a set of measurements and obtain a
On a scale of height, for example, if we can say that John is taller meaningful result.
than Fred, then the scale has the property of magnitude. The most common example of an interval scale is the
Equal Intervals measurement of temperature in degrees Fahrenheit. The Celsius
A scale has the property of equal intervals if the difference scale of temperature is also an interval rather than a ratio scale.
between two points at any place on the scale has the same Although 0 represents freezing on the Celsius scale, it is not an
meaning as the difference between two other points that differ by absolute 0.
the same number of scale units. Scores on many tests, such as tests of intelligence, are analyzed
For example, the difference between inch 2 and inch 4 on a ruler statistically in ways appropriate for data at the interval level of
represents the same quantity as the difference between inch 10 and measurement.
inch 12: exactly 2 inches. Ratio Scales
When a scale has the property of equal intervals, the relationship A scale that has all three properties (magnitude, equal intervals,
between the measured units and some outcome can be described and an absolute 0) is called a ratio scale.
by a straight line or a linear equation. For example, consider the number of yards gained by running
Absolute 0 backs on football teams. Zero yards actually means that the player
An absolute 0 is obtained when nothing of the property being has gained no yards at all.
measured exists.
Page 1 of 6
BPSY 198
FIRST SEMESTER | A.Y. 2022-2023
Describing Data _
Raw Score
A raw score is a straightforward, unmodified accounting of
performance that is usually numerical.
A raw score may reflect a simple tally, as in number of items
responded correctly on an achievement tests.
Raw scores can be converted into other types of scores.
Distribution
Distribution may be defined as a set of test scores arrayed for
recording or study.
Measures of Central Tendency
Frequency Distribution A statistic that indicates the average or midmost score between
The data from the test could be organized into a distribution of the the scores in a distribution.
raw scores. One way the scores could be distributed is by the The center of a distribution can be defined in different ways.
frequency with which they occur. Perhaps the most commonly used measure of central tendency
In a frequency distribution, all scores are listed alongside the is the arithmetic mean (or, more simply, mean), which is referred
number of times each score occurred. The scores might be listed to in everyday language as the “average.” The mean takes into
in tabular or graphic form. account the actual numerical value of every score.
In special instances, such as when there are only a few scores
and one or two of the scores are extreme in relation to the remaining
ones, a measure of central tendency other than the mean may be
desirable. Other measures of central tendency are median and
the mode.
Note that, in the formulas to follow, the standard statistical
shorthand called “summation notation” (summation meaning “the
Often, a frequency distribution is referred to as a simple sum of”) is used. The Greek uppercase letter sigma, Σ, is the
frequency distribution to indicate that individual scores have symbol used to signify “sum”; if X represents a test score, then
been used and the data have not been grouped. the expression Σ X means “add all the test scores.”
Another kind of frequency distribution used to summarize data is The mean
grouped frequency distribution. In a grouped frequency The arithmetic average score in a distribution is called the
distribution, test score intervals, also called class intervals, replace mean.
the actual test scores. To calculate the mean, we total the scores and divide the sum
by the number of cases, or N. The capital Greek letter sigma (∑)
means summation. Thus, the formula for the mean, which we signify
as X is
When the total number of scores ordered is an even number, then obtained by subtracting the mean from the score (X − mean = x).
the median can be calculated by determining the arithmetic The bars on each side of x indicate that it is the absolute value of the
mean of the two middle scores. deviation score (ignoring the positive or negative sign and treating all
The mode deviation scores as positive).
The most frequently occurring score in a distribution of All the deviation scores are then summed and divided by the total
scores is the mode. number of scores (n) to arrive at the average deviation.
Distributions that contain a tie for the designation “most frequently The average deviation is rarely used. Perhaps this is so
occurring score” can have more than one mode. because the deletion of algebraic signs renders it a useless
Consider the following scores—arranged in no particular order— measure for purposes of any further operations.
obtained by 20 students on the final exam of a new trade school The Standard Deviation
called the Home Study School of Elvis Presley Impersonators: We may define the standard deviation as a measure of variability
equal to the square root of the average squared deviations
51 49 51 50 66 52 53 38 17 66 about the mean. More succinctly, it is equal to the square root of
33 44 73 13 21 91 87 92 47 3 the variance.
The variance is equal to the arithmetic mean of the squares of the
These scores are said to have a bimodal distribution because
differences between the scores in a distribution and their mean. The
there are two scores (51 and 66) that occur with the highest
formula used to calculate the variance (s2) using deviation scores is
frequency (of two). Except with nominal data, the mode tends not to
be a very commonly used measure of central tendency.
Measures of Variability
Variability is an indication of how scores in a distribution are
scattered or dispersed. Where (X 2 X) is the deviation of a score from the mean. The
Statistics that describe the amount of variation in a distribution are symbol s is the lowercase Greek sigma; s2 is used as a standard
referred to as measures of variability. Some measures of variability description of the variance. Simply stated, the variance is
include the range, the interquartile range, the semi-interquartile calculated by squaring and summing all the deviation scores
range, the average deviation, the standard deviation, and the and then dividing by the total number of scores.
variance. Though the variance is a useful statistic commonly used in
Range data analysis, it shows the variable in squared deviations around
The range of a distribution is equal to the difference between the the mean rather than in deviations around the mean. In other words,
highest and the lowest scores. the variance is the average squared deviation around the mean. To
The range is the simplest measure of variability to calculate, get it back into the units that will make sense to us, we need to take
but its potential use is limited. Because the range is based entirely the square root of the variance. The square root of the variance is
on the values of the lowest and highest scores, one extreme score the standard deviation (s), and it is represented by the following
(if it happens to be the lowest or the highest) can radically alter the formula:
value of the range.
The interquartile and semi-interquartile ranges
A distribution of test scores (or any other data, for that matter) can
be divided into four parts such that 25% of the test scores
occur in each quarter.
As shown in the picture below, the dividing points between the
four quarters in the distribution are the quartiles. There are three
of them, respectively labeled Q1, Q2, and Q3. Note that quartile
refers to a specific point whereas quarter refers to an interval. The standard deviation is thus the square root of the average
squared deviation around the mean.
Although the standard deviation is not an average deviation, it
gives a useful approximation of how much a typical score is
above or below the average score.
The standard deviation is a very useful measure of variation
because each individual score’s distance from the mean of the
distribution is factored into its computation.
Skewness
Kurtosis The graph tells us that 99.74% of all scores in these normally
distributed spelling-test data lie between ±3 standard deviations.
Stated another way, 99.74% of all spelling test scores lie between
5 and 95.
This graph also illustrates the following characteristics of all normal
distributions.
■ 50% of the scores occur above the mean and 50% of the scores
occur below the mean.
■ Approximately 34% of all scores occur between the mean and 1
standard deviation above the mean.
■ Approximately 34% of all scores occur between the mean and 1
standard deviation below the mean.
The term testing professionals use to refer to the steepness of a ■ Approximately 68% of all scores occur between the mean and ±1
distribution in its center is kurtosis. standard deviation.
To the root kurtic is added to one of the prefixes platy-, lepto-, or ■ Approximately 95% of all scores occur between the mean and ±2
meso- to describe the peakedness/flatness of three general standard deviations.
types of curves. A normal curve has two tails. The area on the normal curve
Distributions are generally described as platykurtic (relatively flat), between 2 and 3 standard deviations above the mean is referred to
leptokurtic (relatively peaked), or—somewhere in the middle— as a tail. The area between −2 and −3 standard deviations below the
mesokurtic. mean is also referred to as a tail.
Distributions that have high kurtosis are characterized by a high
peak and “fatter” tails compared to a normal distribution. Standard Scores
In contrast, lower kurtosis values indicate a distribution with a Simply stated, a standard score is a raw score that has been
rounded peak and thinner tails. converted from one scale to another scale, where the latter scale
has some arbitrarily set mean and standard deviation.
The Normal Curve Raw scores may be converted to standard scores because
standard scores are more easily interpretable than raw scores.
With a standard score, the position of a testtaker’s performance
relative to other testtakers is readily apparent.
Different systems for standard scores exist, each unique in terms
of its respective mean and standard deviations.
First for consideration is the type of standard score scale that may
be thought of as the zero plus or minus one scale. This is so
because it has a mean set at 0 and a standard deviation set at 1.
Raw scores converted into standard scores on this scale are more
popularly referred to as z scores.
Standard scores converted from raw scores may involve either
Development of the concept of a normal curve began in the linear or nonlinear transformations.
middle of the eighteenth century with the work of Abraham Linear Transformation
DeMoivre and, later, the Marquis de Laplace. A standard score obtained by a linear transformation is one that
At the beginning of the nineteenth century, Karl Friedrich Gauss retains a direct numerical relationship to the original raw score.
made some substantial contributions. Through the early The magnitude of differences between such standard scores
nineteenth century, scientists referred to it as the “Laplace-Gaussian exactly parallels the differences between corresponding raw scores.
curve.” Karl Pearson is credited with being the first to refer to Sometimes scores may undergo more than one transformation.
the curve as the normal curve, perhaps in an effort to be Nonlinear Transformation
diplomatic to all of the people who helped develop it. A nonlinear transformation may be required when the data under
Theoretically, the normal curve is a bell-shaped, smooth, consideration are not normally distributed yet comparisons
mathematically defined curve that is highest at its center. From with normal distributions need to be made.
the center it tapers on both sides approaching the X-axis In a nonlinear transformation, the resulting standard score does
asymptotically (meaning that it approaches, but never touches, the not necessarily have a direct numerical relationship to the original,
axis). raw score. As the result of a nonlinear transformation, the original
distribution is said to have been normalized.
The Area under the Normal Curve Normalized Standard Scores
The normal curve can be conveniently divided into areas Many test developers hope that the test they are working on will
defined in units of standard deviation. yield a normal distribution of scores. Yet even after very large
A hypothetical distribution of National Spelling Test scores with a samples have been tested with the instrument under development,
mean of 50 and a standard deviation of 15 is illustrated in the picture skewed distributions result.
below. In this example, a score equal to 1 standard deviation above One alternative available to the test developer is to normalize the
the mean would be equal to 65 (X + 1s = 50 + 15 = 65). distribution. Conceptually, normalizing a distribution involves
“stretching” the skewed curve into the shape of a normal curve and
creating a corresponding scale of standard scores, a scale that is
technically referred to as a normalized standard score scale.
Z Scores
One problem with means and standard deviations is that they do
not convey enough information for us to make meaningful
assessments or accurate interpretations of data. Other metrics
are designed for more exact interpretations.
The Z score transforms data into standardized units that are
easier to interpret. A Z score is the difference between a score and
the mean, divided by the standard deviation:
Page 4 of 6
BPSY 198
FIRST SEMESTER | A.Y. 2022-2023
In other words, a Z score is the deviation of a score Xi from the If two variables simultaneously increase or simultaneously
mean in standard deviation units. If a score is equal to the mean, decrease, then those two variables are said to be positively (or
then its Z score is 0. directly) correlated.
A Z score results from the conversion of a raw score into a A negative (or inverse) correlation occurs when one variable
number indicating how many standard deviation units the raw increases while the other variable decreases.
score is below or above the mean of the distribution. Correlation is often confused with causation. It must be
In essence, a z score is equal to the difference between a emphasized that a correlation coefficient is merely an index of
particular raw score and the mean divided by the standard the relationship between two variables, not an index of the
deviation. causal relationship between two variables. Although correlation
does not imply causation, there is an implication of prediction.
T Scores
If the scale used in the computation of z scores is called a zero The Pearson r
plus or minus one scale, then the scale used in the computation of T
scores can be called a fifty plus or minus ten scale; that is, a
scale with a mean set at 50 and a standard deviation set at 10.
Devised by W. A. McCall (1922, 1939) and named a T score in
honor of his professor E. L. Thorndike, this standard score
system is composed of a scale that ranges from 5 standard Many techniques have been devised to measure correlation. The
deviations below the mean to 5 standard deviations above the most widely used of all is the Pearson r, also known as the
mean. Pearson correlation coefficient and the Pearson product-
Thus, for example, a raw score that fell exactly at 5 standard moment coefficient of correlation.
deviations below the mean would be equal to a T score of 0, a raw Devised by Karl Pearson, r can be the statistical tool of choice
score that fell at the mean would be equal to a T of 50, and a raw when the relationship between the variables is linear and when the
score 5 standard deviations above the mean would be equal to a T two variables being correlated are continuous (or, they can
of 100. theoretically take any value).
One advantage in using T scores is that none of the scores is Other correlational techniques can be employed with data that are
negative. By contrast, in a z score distribution, scores can be discontinuous and where the relationship is nonlinear.
positive and negative; this can make further computation
cumbersome in some instances.
Coefficient of determination
The value obtained for the coefficient of correlation can be further
Other Standard Scores interpreted by deriving from it what is called a coefficient of
Numerous other standard scoring systems exist. Researchers determination, or r2.
during World War II developed a standard score with a mean of 5 The coefficient of determination is an indication of how much
and a standard deviation of approximately 2. Divided into nine variance is shared by the X- and the Y-variables. The calculation
units, the scale was christened a stanine, a term that was a of r2 is quite straightforward. Simply square the correlation coefficient
contraction of the words standard and nine. and multiply by 100; the result is equal to the percentage of the
Another type of standard score is employed on tests such as the variance accounted for.
Scholastic Aptitude Test (SAT) and the Graduate Record
Examination (GRE). Raw scores on those tests are converted to
standard scores such that the resulting distribution has a mean of The Spearman Rho
500 and a standard deviation of 100. One commonly used alternative statistic is variously called a rank-
order correlation coefficient, a rank-difference correlation
coefficient, or simply Spearman’s rho.
Correlation and Inference _
Developed by Charles Spearman, a British psychologist, this
Central to psychological testing and assessment are inferences coefficient of correlation is frequently used when the sample size
(deduced conclusions) about how some things (such as traits, is small (fewer than 30 pairs of measurements) and especially when
abilities, or interests) are related to other things (such as both sets of measurements are in ordinal (or rank-order) form.
behavior). Spearman’s rho is a method of correlation for finding the
A coefficient of correlation (or correlation coefficient) is a number association between two sets of ranks.
that provides us with an index of the strength of the relationship The rho coefficient (r) is easy to calculate and is often used when
between two things. the individuals in a sample can be ranked on two variables but
An understanding of the concept of correlation and an ability to their actual scores are not known or have a normal distribution.
compute a coefficient of correlation is therefore central to the study
of tests and measurement.
Graphic Representations of Correlation
Scatterplot
The Concept of Correlation
Simply stated, correlation is an expression of the degree and
direction of correspondence between two things.
A coefficient of correlation (r) expresses a linear relationship
between two (and only two) variables, usually continuous in
nature. It reflects the degree of concomitant variation between
variable X and variable Y.
The coefficient of correlation is the numerical index that
expresses this relationship: It tells us the extent to which X and
Y are “co-related.”
The meaning of a correlation coefficient is interpreted by its sign
and magnitude. If a person were asked about its sign, It would
answer “plus” (for a positive correlation), “minus” (for a
negative correlation), or “none” (in the rare instance that the
correlation coefficient was exactly equal to zero). If asked to
supply information about its magnitude, it would respond with a
number anywhere at all between −1 and +1.
The intriguing fact about the magnitude of a correlation coefficient:
It is judged by its absolute value. This means that to the extent that
we are impressed by correlation coefficients, a correlation of −.99 is
One type of graphic representation of correlation is referred to by
every bit as impressive as a correlation of +.99.
many names, including a bivariate distribution, a scatter diagram,
a scattergram, or—our favorite—a scatterplot.
Page 5 of 6
BPSY 198
FIRST SEMESTER | A.Y. 2022-2023
Scatterplot (Positive)
Meta-Analysis
Scatterplot (Negative) Generally, the best estimate of the correlation between two
variables is most likely to come not from a single study alone but
from analysis of the data from several studies.
Meta-analysis may be defined as a family of techniques used to
statistically combine information across studies to produce
single estimates of the data under study. The estimates derived,
referred to as effect size, may take several different forms.
Effect Size
An estimate of the strength of the relationship (or the size of the
differences) between groups.
References:
Cohen, R. J., & Swerdlik, M. E. (2017). Psychological testing and
assessment (9th ed.). McGraw Hill Education.
Kaplan, R. & Sacuzzo, D. (2019). Psychological Testing: Principles,
Applications, and Issues (9th ed.). Cengage Learning
Page 6 of 6