Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

Introduction to Statistics

Definition of Statistics
Statistics is generally known as a field of mathematics. It is associated with numbers and figures
such that the term “statistics” is a shortened version of “statistical procedures”. Therefore, statistics is
defined as a set of mathematical procedures used for organizing, summarizing, and interpreting
information. For behavioral sciences, statistics is used to assess and evaluate the research results
specifically to:
1. Organize and summarize the information so that the researcher can see what happened in
the research study and can communicate results to others
2. Help the researcher to answer the questions that initiated the research by determining
exactly what general conclusions are justified based on the specific results that were
obtained.
Example 1: You want to know whether students learn better through printed material or online
materials. You must gather information like their grades or activity scores. Once you have this
information, you use statistics to make sense of them. Statistics can help you decide which set of grades
(either with printed or online) is better.
Statistics is a universal set of techniques understood by the scientific community. A technique
one uses for research is known and is also being used by others.

Descriptive and Inferential Statistics


The two main branches of statistics are descriptive and inferential statistics. Descriptive statistics
are statistical procedures used to summarize, organize, and simplify data. The data are commonly
presented in a table or graph. Descriptive statistics includes the following:
 Measures of central tendency (mean, median, mode)
 Measures of variability (range, standard deviation, variance)
 Measures of relative position (percentile, quartile)
 Skewness
Inferential statistics consists of techniques that allow us to study samples and then generalize
about the populations from which they were selected. This includes:
 Hypothesis testing
 ANOVA
 Regression analysis
 Chi-square
Because populations are typically very large, it usually is not possible to measure everyone in the
population. Therefore, a sample is selected to represent the population. By analyzing the results from the
sample, we hope to make general statements about the population. Typically, researchers use sample
statistics as the basis for drawing conclusions about population parameters.

Scales of Measurement
Data collection requires that we make measurements of our observations. Measurement involves
assigning individuals or events to categories. The categories can simply be names such as male/female or
employed/unemployed, or they can be numerical values such as 68 inches or 175 pounds. The categories
used to measure a variable make up a scale of measurement, and the relationships between the categories
determine different types of scales.
Scales of measurement are important because they highlight the limitations of each scale and
scales of measurement are important because they help determine the statistics that are used to evaluate
the data. Specifically, there are certain statistical procedures that are used with numerical scores from
interval or ratio scales and other statistical procedures that are used with nonnumerical scores from
nominal or ordinal scales.
Nominal Scale
The word nominal means “having to do with names.” Measurement on a nominal scale involves
classifying individuals into categories that have different names but are not related to each other in any
systematic way. For example, if you were measuring the academic majors for a group of college students,
the categories would be art, biology, business, chemistry, and so on. Each student would be classified in
one category according to his or her major. The measurements from a nominal scale allow us to
determine whether two individuals are different, but they do not identify either the direction or the size of
the difference. If one student is an art major and another is a biology major, we can say that they are
different, but we cannot say that art is “more than” or “less than” biology and we cannot specify how
much difference there is between art and biology. Other examples of nominal scales include classifying
people by race, gender, or occupation.
A nominal scale consists of a set of categories that have different names. Measurements on a
nominal scale label and categorize observations, but do not make any quantitative distinctions between
observations.
Although the categories on a nominal scale are not quantitative values, they are occasionally
represented by numbers. For example, the rooms or offices in a building may be identified by numbers.
You should realize that the room numbers are simply names and do not reflect any quantitative
information. Room 109 is not necessarily bigger than Room 100 and certainly not 9 points bigger. It also
is common to use numerical values as a code for nominal categories when data are entered into computer
programs. For example, the data from a survey may code males with a 0 and females with a 1. Again, the
numerical values are simply names and do not represent any quantitative difference. The scales that
follow do reflect an attempt to make quantitative distinctions.

Ordinal Scale
The categories that make up an ordinal scale not only have different names (as in a nominal scale)
but also are organized in a fixed order corresponding to differences of magnitude. An ordinal scale
consists of a set of categories that are organized in an ordered sequence. Measurements on an ordinal
scale rank observation in terms of size or magnitude. Ordinal scales imply ranks and order like 1 st, 2nd, and
3rd or small, medium, and large. Since there is ranking or order, there is a directional relationship between
the categories typically smallest to largest or largest to smallest.
With measurements from an ordinal scale, you can determine whether two individuals are
different, and you can determine the direction of difference. However, ordinal measurements do not allow
you to determine the size of the difference between two individuals. In a NASCAR race, for example, the
first-place car finished faster than the second-place car, but the ranks do not tell you how much faster.
Other examples of ordinal scales include socioeconomic class (upper, middle, lower) and T-shirt sizes
(small, medium, large). In addition, ordinal scales are often used to measure variables for which it is
difficult to assign numerical scores. For example, people can rank their food preferences but might have
trouble explaining “how much” they prefer chocolate ice cream to steak.

Interval and Ratio Scale


Both an interval scale and a ratio scale consist of a series of ordered categories (like an ordinal
scale) with the additional requirement that the categories form a series of intervals that are all the same
size. Thus, the scale of measurement consists of a series of equal intervals, such as inches on a ruler.
Other examples of interval and ratio scales are the measurement of time in seconds, weight in pounds, and
temperature in degrees Fahrenheit. Note that, in each case, one interval (1 inch, 1 second, 1 pound, 1
degree) is the same size, no matter where it is located on the scale. The fact that the intervals are all the
same size makes it possible to determine both the size and the direction of the difference between two
measurements. For example, you know that a measurement of 80° Fahrenheit is higher than a measure of
60°, and you know that it is exactly 20° higher.
An interval scale consists of ordered categories that are all intervals of the same size. Equal
differences between numbers on scale reflect equal differences in magnitude. However, the zero point on
an interval scale is arbitrary and does not indicate a zero amount of the variable being measured. A ratio
scale is an interval scale with the additional feature of an absolute zero point. With a ratio scale, ratios of
numbers do reflect ratios of magnitude.
The factor that differentiates an interval scale from a ratio scale is the nature of the zero point. An
interval scale has an arbitrary zero point. That is, the value 0 is assigned to a particular location on the
scale simply as a matter of convenience or reference. A value of zero does not indicate a total absence of
the variable being measured. For example, a temperature of 0º Fahrenheit does not mean that there is no
temperature, and it does not prohibit the temperature from going even lower. Interval scales with an
arbitrary zero point are relatively rare. The two most common examples are the Fahrenheit and Celsius
temperature scales. Other examples include golf scores (above and below par) and relative measures such
as above and below average rainfall.
A ratio scale is anchored by a zero point that is not arbitrary but rather is a meaningful value
representing none (a complete absence) of the variable being measured. The existence of an absolute,
non-arbitrary zero point means that we can measure the absolute amount of the variable; that is, we can
measure the distance from 0. This makes it possible to compare measurements in terms of ratios. For
example, a gas tank with 10 gallons (10 more than 0) has twice as much gas as a tank with only 5 gallons
(5 more than 0). Also note that a completely empty tank has 0 gallons. To recap, with a ratio scale, we can
measure the direction and the size of the difference between two measurements, and we can describe the
difference in terms of a ratio. Ratio scales are quite common and include physical measures such as
height and weight, as well as variables such as reaction time or the number of errors on a test.

Another way of differentiating the scales of measurement is through the properties that define
them. Generally, there are 3 main properties of scales of measurement. We have already discussed them
above but will not focus on them in this section.
1. Magnitude
This is the property of “moreness”. A scale has the property of magnitude if we can say
that a particular instance of the attribute represents more, less, or equal amounts of the given
quantity than does another instance (Aron & Aron, 2003; Hurlburt, 2003; McCall, 2001; Howell,
2008). On a scale of height, for example, if we can say that John is taller than Fred, then the scale
has the property of magnitude. A scale that does not have this property arises, for example, when
a gym coach assigns identification numbers to teams in a league (team 1, team 2, and so forth).
Because the numbers only label the teams, they do not have the property of magnitude. If the
coach were to rank the teams by the number of games they have won, then the new numbering
system (games won) would have the property of magnitude.
2. Equal Intervals
A scale has the property of equal intervals if the difference between two points at any
place on the scale has the same meaning as the difference between two other points that differ by
the same number of scale units. For example, the difference between inch 2 and inch 4 on a ruler
represents the same quantity as the difference between inch 10 and inch 12: exactly 2 inches.
3. Arbitrary 0
An absolute 0 is obtained when nothing of the property being measured exists. For
example, if you are measuring heart rate and observe that your patient has a rate of 0 and has
died, then you would conclude that there is no heart rate at all. For many psychological qualities,
it is extremely difficult, if not impossible, to define an absolute 0 point. For example, if one
measures shyness on a scale from 0 through 10, then it is hard to define what it means for a
person to have absolutely no shyness (McCall, 2001).

The presence of this properties in the scales discussed above are summarized in the table below.

Presentation of Data

Definition of Frequency Distribution


Based on the previous lesson, descriptive statistics aims to simplify the organization and
presentation of data. This goal allows a researcher to organize scores into simpler and easy to understand
form, hence making the data easily communicated to others. One of the most common procedures for
organizing a set of data is to place the scores in a frequency distribution.
Frequency distribution is an organized tabulation of score values and their frequency of
occurrence.
A frequency distribution takes a disorganized set of scores and places them in order from highest
to lowest, grouping together individuals who all have the same score. If the highest score is X = 10, for
example, the frequency distribution groups together all the 10s, then all the 9s, then the 8s, and so on. A
frequency distribution can be structured either as a table or as a graph, but in either case, the distribution
presents the same two elements:
1. The set of categories or scores that make up the original measurement scale.
2. A record of the frequency, or number of individuals in each category.

Frequency Distribution Table


The simplest frequency distribution table presents the measurement scale by listing the different
measurement categories (S values) in a column from highest to lowest. Beside each S value, we indicate
the frequency, or the number of times that particular measurement occurred in the data. It is customary to
use an S as the column heading for the scores and an f as the column heading for the frequencies.

A. Ungrouped Frequency Distribution Table


This is useful in organizing small data, specifically when the number of scores obtained
is below 30.
Example: The following set of scores was obtained from a 10-point statistics quiz. These scores
can be organized by constructing a frequency distribution table.
8 9 8 7 10 9 6 4 9 8
7 8 10 9 8 6 9 7 8 8

S f rf p
4 1 0.05 5%
5 0 0.00 0%
6 2 0.10 10%
7 3 0.15 15%
8 7 0.35 35%
9 5 0.25 25%
10 2 0.10 10%
∑f = 20 ∑rf = 1 ∑p = 100

B. Grouped Frequency Distribution Table


This is useful in organizing large data, specifically when the number of scores obtained is
30 and above.
Example: A researcher had 50 freshmen psychology students keep a diary of their social
interactions for a week during the regular trimester. Each time a participant had a social
interaction lasting 10 minutes or longer, he or she would fill out a card. The card had questions
about various aspects of the conversation and the conversation partner. Excluding family and
work situations, the number of social interactions 10 minutes or longer over a week for these
students were as follows:
53 15 33 3 21 19 17 16 44 25
30 3 5 9 35 32 26 13 14 14
47 47 18 11 5 19 24 17 6 25
8 18 29 3 19 22 3 22 29 2
6 10 29 10 29 21 38 41 16 17

Class Intervals f cf
2-9 11 11
10-17 12 23
18-25 12 35
26-33 8 43
34-41 3 46
42-49 3 49
50-57 1 50
N = 50

Frequency Distribution Graph


A frequency distribution graph is basically a picture of the information available in a frequency
distribution table. It has two perpendicular lines called axes.
Guidelines:
1. A graph has two axes: vertical and horizontal. The vertical axis is called the ordinate, or Y axis,
and the horizontal axis is the abscissa, or X axis.
2. Very often the independent variable is plotted on the X axis and the dependent variable on the Y
axis. In graphing a frequency distribution, the score values are usually plotted on the X axis and
the frequency of the score values is plotted on the Y axis.
3. Suitable units for plotting scores should be chosen along the axes.
4. To avoid distorting the data, it is customary to set the intersection of the two axes at zero and then
choose scales for the axes such that the height (Y-axis) of the graphed data is about two-thirds or
three-fourths of the length (X-axis).

I. Graphs for Nominal or Ordinal Data


When the scores are measured on a nominal or ordinal scale (usually non-numerical values),
the frequency distribution can be displayed in a bar graph. Remember that nominal data includes
classifications only whole ordinal includes rankings. For nominal data it can be males and
females, Baguio, La Union, and Pangasinan, or Roman Catholic, Christian, and Muslim. For
ordinal it can be small, medium, and large, or 1st, 2nd, and 3rd.

Bar Graph
Frequency distributions of nominal or ordinal data are customarily plotted using a bar
graph. This type of graph is shown below. A bar is drawn for each category, where the
height of the bar represents the frequency or number of members of that category. Since
there is no numerical relationship between the categories in nominal data, the various
groups can be arranged along the horizontal axis in any order. In the figure found below,
they are arranged from left to right according to the magnitude of frequency in each
category. Note that the bars for each category in a bar graph do not touch each other. This
further emphasizes the lack of a quantitative relationship between the categories. For a
nominal scale, the space between bars emphasizes that the scale consists of separate,
distinct categories. For ordinal scales, separate bars are used because you cannot assume
that the categories are all the same size.

II. Graphs for Interval and Ratio Data


Any other scores or values are either considered ratio or interval data. Examples include test
scores, time, number of children, and more. When dealing with these data, use the following
graphs.

A. Histogram
The histogram is used to represent frequency distributions composed of interval or ratio
data. It resembles the bar graph, but with the histogram, a bar is drawn for each class interval.
The class intervals or their midpoints (grouped frequency table) or the scores (ungrouped
frequency table) are plotted on the horizontal axis such that each class bar begins and
terminates at the real limits of the interval or the midpoint or score is aligned with the center
of the bar. The height of the bar corresponds to the frequency of the class interval. Since the
intervals are continuous, the vertical bars must touch each other rather than be spaced apart as
is done with the bar graph. This type of graph is most appropriate for discrete data.
B. Frequency Polygon
The frequency polygon is also used to represent interval or ratio data. The horizontal axis
is identical to that of the histogram. However, for this type of graph, instead of using bars, a
point is plotted over the midpoint of each interval at a height corresponding to the frequency
of the interval. The points are then joined with straight lines. Finally, the line joining the
points is extended to meet the horizontal axis at the midpoint of the two class intervals falling
immediately beyond the end class intervals containing scores. This closing of the line with
the horizontal axis forms a polygon, from which the name of this graph is taken. This type of
graph is most appropriate for continuous data – scores that can have decimal values.

Difference between Histogram and Polygon


The major difference between a histogram and a frequency polygon is the following: The
histogram displays the scores as though they were equally distributed over the interval, whereas the
frequency polygon displays the scores as though they were all concentrated at the midpoint of the
interval. Some investigators prefer to use the frequency polygon when they are comparing the shapes of
two or more distributions. The frequency polygon also has the effect of displaying the scores as though
they were continuously distributed, which in many instances is the case. Another difference is that
histograms are commonly used with discrete data while frequency polygons, with continuous data.

Normal Distribution Curve and Standard Scores


When a population consists of numerical scores from an interval or a ratio scale, it is customary
to draw the distribution with a smooth curve instead of the jagged, stepwise shapes that occur with

PEAK
Below the Above the
Mean MEAN Mean
LOWER END UPPER END

TAIL
TAIL
3SD 2SD 1SD 1SD 2SD 3SD
STANDARD
DEVIATION
histograms and polygons. The smooth curve indicates that you are not connecting a series of dots (real
frequencies) but instead are showing the relative changes that occur from one score to the next. One
commonly occurring population distribution is the normal curve. The word normal refers to a specific
shape that can be precisely defined by an equation. Less precisely, we can describe a normal distribution
as being symmetrical, with the greatest frequency in the middle and relatively smaller frequencies as you
move toward either extreme. A normal curve is also called as Gaussian curve, normal distribution curve
(NDC), and bell curve. An example of a normal distribution curve and its parts can be seen below.

A normal distribution curve is a mathematical model that ONLY ASSUMES that the distribution
of scores is normal. It does not exist. A normal distribution curve has the following characteristics:
1. It is a symmetrical bell-shaped curve.
2. The mean, median and the mode are identical.
3. It is asymptotic as the curve will never touch the baseline.
Shape of a Frequency Distribution
A distribution of scores can be described by 3 main characteristics: central tendency, variability,
and shape. We have already discussed central tendency and variability, so we move to shape. The shape
of a distribution can be seen by assessing the curve that the scores form. We have discussed how normal
curves are formed when normal distribution is assumed. But most of the time, distributions are not
guaranteed to have normal distributions. This can be assessed by tracing the curve formed by histograms
of scores just like in the example below. A normal curve can be drawn on histograms to check if the bars
form a shape just like a normal distribution does. This is also the reason why frequency polygons are
preferred. They immediately show the shape of the distribution without the need of tracing a curve.

In a symmetrical distribution, it is possible to draw a vertical line through the middle so that one
side of the distribution is a mirror image of the other. A symmetrical distribution mirrors a normal curve.
In a skewed distribution, the scores tend to pile up toward one end of the scale and taper off gradually at
the other end. The section where the scores taper off toward one end of a distribution is called the tail of
the distribution. A skewed distribution with the tail on the right-hand side is positively skewed because
the tail points toward the positive (above-zero) end of the X-axis. If the tail points to the left, the
distribution is negatively skewed. These categories are summarized and depicted through the figures
below.
Shape of a
Distribution

Skewed Symmetrical
Distribution Distribution

Positively Negatively
Skewed Skewed
Distribution Distribution

A symmetrical distribution can be obtained if the scores are normally distributed from the
highest to the lowest point. That is, there is no cluster of scores on one side. A skewed distribution
happens when scores are clustered only on one side either on the lower or the upper end of the
distribution. Look at the examples below.
Symmetrical Distribution 2,3,4,5,6,7,8,9,7,8,6,7,4,5,2,10
Positively Skewed Distribution 2,2,2,2,3,3,3,4,2,3,10,8,7
Negatively Skewed Distribution 2,2,3,8,8,9,7,10,10,10

In a positively skewed distribution, there are more lower scores while in a negatively skewed
distribution, there are more higher scores. For example, in an exam, if most of the students got a low
score, the distribution will become positively skewed as they would cluster on the lower end of the curve.
But, if most of the students got a high score, it would produce a negatively skewed distribution as most of
the scores would be found in the upper end. In the example, a positively skewed distribution indicates
that the exam is hard while a negatively skewed distribution indicates that the exam is easy.
Not all distributions are perfectly symmetrical or obviously skewed in one direction. Therefore, it
is common to modify these descriptions of shape with phrases likely “roughly symmetrical” or “tends to
be positively skewed.” The goal is to provide a general idea of the appearance of the distribution.
The figure below summarizes how a curve can be seen in the histograms of the following
distributions. You can see a normal curve, a positively skewed curve, and a negatively skewed curve.
Aside from comparing the shape of histograms to a normal curve, there are other ways of assessing the
shape of a distribution. You can also use your quartiles in doing so. If Q 3-Q2 is not equal to Q2-Q1, this
can be indicative of a skewed distribution and vice versa. If Q 2-Q1 is greater than Q3-Q2, then the
distribution is negatively skewed and vice versa for positively skewed. Another way is checking the
mean, median, in mode. As discussed earlier, in a normal distribution, the three measures are equal.
Therefore, if they are far apart, that can be indicative of a skewed distribution. These three measures can
also be used to check if a distribution is positively or negatively skewed. If the distribution of data is
skewed to the left, the mean is less than the median, which is often less than the mode. If the distribution
of data is skewed to the right, the mode is often less than the median, which is less than the mean. These
can be seen in the figure below.

Kurtosis
In relation to its shape, a distribution can also be described in terms of whether the middle of the
distribution is peaked or flat. In describing this characteristic, the standard of comparison is a bell-shaped
curve. Kurtosis is how much the shape of a distribution differs from a normal curve in terms of whether
its curve in the middle is more peaked or flat than the normal curve. There are three types of kurtosis:
mesokurtic, leptokurtic, and platykurtic. A mesokurtic curve (a) is the same as a normal curve. It has a
medium peaked height, and its tails are spread properly. A leptokurtic curve (b) has a very peaked curve
as it has more scores in the tails of the distribution than the normal curve. It is as if the normal curve got
pinched in the middle and some of it went up into a sharp peak and the rest spread out into thick tails. A
platykurtic curve (c) has a flatter curve with fewer scores in the tails of the distribution than the normal
curve. It is as if the tails and the top of the curve both got sucked in toward the middle on both sides.

Areas under the Curve


In distributions that are normally shaped, there is a special relationship between the mean and the
standard deviation regarding the area contained under the curve. When a set of scores is normally
distributed, 34.13% of the area under the curve is contained between the mean and 1SD above or below
the mean. 13.59% of the area is contained between 1SD and 2SD above or below the mean. 2.15% of the
area is contained between 2SD and 3SD above or below the mean. 0.13% of the area exists beyond 3SD
above or below the mean. Since frequency is plotted on the vertical axis, these percentages represent the
percentage of scores contained within the area.
To illustrate, suppose we have a population of 10,000 IQ scores. The distribution is normally
shaped, where the mean is 100 and the SD is 16. Since the scores are normally distributed, 34.13% of the
scores are contained between scores of 100 and 116, 13.59% between 116 and 132, 2.15% between 132
and 148, and 0.13% above 148. Similarly, 34.13% of the scores fall between 84 and 100, 13.59%
between 68 and 84, 2.15% between 52 and 68, and 0.13% below 52.
To calculate the number of scores in each area, all we need to do is multiply the relevant
percentage by the total number of scores. Thus, there are 34.13% X 10,000 = 3413 scores between 100
and 116, 13.59% X 10,000 = 1359 scores between 116 and 132, and 215 scores between 132 and 148; 13
scores are greater than 148. For the other half of the distribution, there are 3413 scores between 84 and
100, 1359 scores between 68 and 84, and 215 scores between 52 and 68; there are 13 scores below 52.
Note that these frequencies would be true only if the distribution is exactly normally distributed. In actual
practice, the frequencies would vary slightly depending on how close the distribution is to this theoretical
model. This can be seen in the figure below.
If you want to identify the percentage of scores that are within 1SD of the mean, just add 34.13%
from above 1SD and 34.13% from below 1SD which is equal to 68.26%. If its 2SD from the mean, then
it is 95.44% and 3SD will be 99.72%. This is seen in the figure below.

Standard Scores
A statistical technique that uses the mean and the standard deviation to transform each score (X
value) into a z-score or a standard score. The purpose of z-scores, or standard scores, is to identify and
describe the exact location of each score in a distribution. A score by itself does not necessarily provide
much information about its position within a distribution. These original, unchanged scores that are the
direct result of measurement are called raw scores. To make raw scores more meaningful, they are often
transformed into new values that contain more information. This transformation is one purpose for z-
scores. We transform X values into z-scores so that the resulting z-scores tell exactly where the original
scores are located. A second purpose for z-scores is to standardize an entire distribution. A common
example of a standardized distribution is the distribution of IQ scores. Although there are several
different tests for measuring IQ, the tests usually are standardized so that they have a mean of 100 and a
standard deviation of 15. Because most of the different tests are standardized, it is possible to understand
and compare IQ scores even though they come from different tests. For example, we all understand that
an IQ score of 95 is a little below average, no matter which IQ test was used. Similarly, an IQ of 145 is
extremely high, no matter which IQ test was used. In general terms, the process of standardizing takes
different distributions and makes them equivalent. The advantage of this process is that it is possible to
compare distributions even though they may have been quite different before standardization. In
summary, the process of transforming X values into z-scores serves two useful purposes:
1. Each z-score tells the exact location of the original X value within the distribution.
2. The z-scores form a standardized distribution that can be directly compared to other
distributions that also have been transformed into z-scores.
One of the primary purposes of a z-score is to describe the exact location of a score within a
distribution. The z-score accomplishes this goal by transforming each X value into a signed number (+ or
−) so that the sign tells whether the score is located above (+) or below ( −) the mean, and the number
tells the distance between the score and the mean in terms of the number of standard deviations. Thus, in
a distribution of IQ scores with μ = 100 and σ = 15, a score of X = 130 would be transformed into z =
+2.00. The z-score value indicates that the score is located above the mean (+) by a distance of 2 standard
deviations (30 points).
All z-scores above the mean are positive and all z-scores below the mean are negative. The sign
of a z-score tells you immediately whether the score is located above or below the mean. Also, note that a
z-score of z = +1.00 corresponds to a position exactly 1 standard deviation above the mean. A z-score of
z = +2.00 is always located exactly 2 standard deviations above the mean. The numerical value of the z-
score tells you the number of standard deviations from the mean. The locations identified by z-scores are
the same for all distributions, no matter what mean or standard deviation the distributions may have. The
normal curve is used as a reference in using the z-score just like in the figure below.

Standard Scores
1. Z-score
 Results from the conversion of a raw score into a number indicating how many standard
deviation units the raw score is below or above the mean of the distribution
 Equal to the difference between a raw score and the mean divided by the standard
deviation
 Allows for comparison between scores of two different tests
 Scale used in computation of z-score is called a zero plus or minus one with 0 being the
mean and 1, the standard deviation

 For example, Juan got 20 in his quiz


that has a mean of 15 and a SD of 4. Compute for the z-score equivalent to the raw score
of 20.
20−15 5
o z= = =1.25
4 4
o Therefore, Juan’s raw score of 20 is equivalent to a z-score of 1.25
2. T-scores
 Score with a mean 50 and a standard deviation of 10
 Devised by W. A. McCall and called t-score in honor of his professor, E. L. Thorndike
 Composed of a scale that ranges from 5 standard deviations below the mean to 5 standard
deviations above the mean
 Range is from 0 – 100 with 50 being the middle
 T-scores do not have scores with a negative sign unlike z-scores
3. Stanine
 Has a mean of 5 and standard deviation of 2
 Divides the curve into 9 units
 Take on whole values from 1 to 9

 To determine what raw score/s represents each stanine, use the table below. Also, you
can see the equivalent percentile and percentage from the table.
STANI Z PERCENTAGE OF PERCENTILE
NE SCORES
9 Above 1.75 Top 4% Above 96
8 1.25 to 1.75 Next Top7% 89 – 96
7 .75 to 1.25 Next Top12% 77 – 89
6 .25 to .75 Next Top 17% 60 – 77
5 -.25 to .25 Middle 20% 40 – 60
4 -.75 to -.25 Next Bottom 17% 23 – 40
3 -1.25 to -.75 Next Bottom12% 11 – 23
2 -1.75 to -1.25 Next Bottom 7% 4 – 11
1 Below - 1.75 Bottom 4% Below 4

 For example, let’s try to see what raw score is equivalent to stanine 1 in a test where the
mean is 15 and the SD is 4.
o Here, we use the z-score formula seen below.
X −Mean
o z=
SD
o Based on the table, stanine 1 is equivalent to a z- score of -1.75, so we substitute
that and compute accordingly.
X −15
o −1.75=
4
=(−1.75 ) ( 4 )= X−15

o −7=X −15
o X =15−7=8
o Therefore, any raw score BELOW 8 is equivalent to stanine 1.
 Let’s have another example, stanine 2. We do the same process above; however, for the
z-score, get the upper limit. In this case, stanine 2 is equivalent to a z-score of -1.75 to -
1.25, so the upper limit is -1.25.
X −15
o −1.25=
4
=(−1.25 ) ( 4 )= X−15

o −5=X−15
o X =15−5=10
o Therefore, any raw score from 8 to 10 is equivalent to stanine 2.
 You can compute for the other stanines on your own. The complete set can be seen
below. Note: in making the next interval, start with the next value. For example, stanine 2
is from 8 to 10, in making stanine 3, you start with 11 up to your computed value.
Stanine Score
9 Above 22
8 21 – 22
7 19 – 20
6 17 – 18
5 15 – 16
4 13 – 14
3 11 – 12
2 8 – 10
1 Below 8
 You can also check to see if you’re equivalent scores are correct. Let’s try with stanine 2.
Our computed equivalent scores are from 8 to 10. Let’s try to see what z-score 9
(between 8 and 10) is equivalent to, same with stanine.
9−15
o z= =−1.5
5
o A raw score of 9 is equivalent to a z-score of -1.5. Based on the table above, it
should be in stanine 9 (z = -1.75 to -1.25), implying our computation is correct.

 Some recommended interpretations for stanine can be seen below.


4. A or CEEB Scores
 Has a mean of 500 and standard deviation of 100
 Used in Scholastic Aptitude Tests (SAT) and Graduate Record Examination (GRE)
5. Sten Scores
 Divides the distribution into 10 equal parts
 Has a mean of 5.5 and standard deviation of 2.2

 To determine what raw score/s represents each sten, use the table below. Also, you can
see the equivalent percentile and percentage from the table.
STEN Z % PERCENTILE
10 Above 2.0 Top 2.3 Above 97.9
9 1.5 to 2.0 Next Top 4.4 93.5 – 97.9
8 1.0 to 1 .5 Next Top 9.2 84.3 – 93.5
7 .5 to 1 .0 Next Top 15.0 69.3 – 84.3
6 0.0 to .5 Middle 19.2 50.9 – 69.3
5 –.5 to 0.0 Middle 19.2 30-.9 – 50.1
4 –.1 to – .5 Next Bottom 15.0 15.9 – 30.9
3 – 1.5 to – 1.0 Next Bottom 9.2 6.7 – 15.9
2 –2.0 to – 1.5 Next Bottom 4.4 2.3 – 6.7
1 Below – 2.0 Bottom 2.3 Below 2.3
 The same process with obtaining stanine is used for sten. Just compute for the raw scores
given the z-scores in the table above.
 Try to compute a sten table on your own. The complete table can be seen below.
Remember, the mean for this test is 15 and SD is 4.
STEN SCORE
10 Above 23
9 22 – 23
8 20 – 21
7 18 – 19
6 16 – 17
5 14 – 15
4 12 - 13
3 10 – 11
2 8–9
1 Below 7
 A recommended interpretation for sten can be seen below.
STEN LEVEL
8-10 Very High
7 High
5-6 Average
4 Low
1-3 Very Low
6. AGCT Score
 Mean of 100 and SD of 20
7. IQ or Deviation IQ (Intelligence Quotient)
 Has a mean of 100 and a standard deviation of 15

Transforming Raw Scores into Various Standard Scores


When transforming raw scores into standard scores, you can use the z-score. The general formula
is as follows:
Standard Score=Mean+ ( z ) SD

For example, let us transform the raw score of Juan (20) into other standard scores. Take note, the
mean and SD of all other standard scores can be seen above.
Let’s try to transform it into a T score. Remember, the mean for T score is 50 and the SD is 10.
Therefore,
T =50+ ( 1.25 ) 10=¿62.5
You can solve the others on your own by following the example above. The complete table can
be seen below.
Raw z-score T score A or Stanine Sten DIQ AGCT
Score CEEB
score
20 1.25 62.5 625 7.5 = 8 8.25 = 8 118.75 = 125
119
Although do take note, a disadvantage of stanine and sten is that both are not equally sized
meaning a raw score may be closer to the next stanine than the one identified. For example, based on the
table, Juan’s score of 20, when we transformed it, is equivalent to stanine 8. However, in the table we did
above (same mean and SD), it should be in stanine 7. This portrays the disadvantage, we could argue that
it is in stanine 7, but it could also be in 8 since it is closer to the range in stanine 8.

A suggested interpretation guide for the standard scores can be seen below.
LEVEL z T AGCT CEEB DIQ
Very Superior 1.33 and above 63 and above 127 and above 633 and 130 and above
above
Superior .67 to 1.32 57-62 113 – 126 567 – 632 120- 129
Above -.67 to .66 43 – 56 87 – 112 433 – 566 110-119
Average
Average -1.33 to -.66 37 – 42 73 – 86 367 - 432 90 – 109
Below Below -1.33 36 and below 72 and below 366 and 80-89
Average below
Borderline 70-79
Impaired 69 and below

Measures of Central Tendency


The general purpose of descriptive statistical methods is to organize and summarize a set of
scores. Perhaps the most common method for summarizing and describing a distribution is to find a single
value that defines the average score and can serve as a typical example to represent the entire distribution.
In statistics, the concept of an average or representative score is called central tendency. Central tendency
is a statistical measure to determine a single score that defines the center of a distribution. The goal in
measuring central tendency is to describe a distribution of scores by determining a single value that
identifies the center of the distribution. Ideally, this central value will be the score that is the best
representative value for all the individuals in the distribution.
In everyday language, central tendency attempts to identify the “average” or “typical” individual.
This average value can then be used to provide a simple description of an entire population or a sample.
In addition to describing an entire distribution, measures of central tendency are also useful for making
comparisons between groups of individuals or between sets of data.
For example, let us say the average income in Baguio City is PHP13,000 while in Manila, it is
PHP20,000. This example shows how we can describe a large set of data with a single, representative
number. Central tendency characterizes what is typical for a large population and in doing so makes large
amounts of data more digestible. Instead of having thousands of incomes per month per individual in each
city, you can use the central tendency measure to compare both cities. Statisticians sometimes use the
expression “number crunching” to illustrate this aspect of data description. That is, we take a distribution
consisting of many scores and “crunch” them down to a single value that describes them all.

Mean
The mean, also known as the arithmetic average, is computed by adding all the scores in the
distribution and dividing by the number of scores. The mean for a population is identified by the Greek
letter mu, u (pronounced “mew”), and the mean for a sample is identified by M (common in research) or
X (common in textbooks; read “x-bar”).
Characteristics of the Mean
1. The mean represents dividing the total score and distributing it equally among the individuals. If
the total money is PHP20 and there are 4 individuals, then it can be said that everyone has
PHP5.00 each. In this case, the mean is also PHP5.00.
2. The mean serves as the balance point. It is the center of the distribution, so it ALWAYS occurs
between the highest and lowest score.
3. Changing any score in the distribution can affect the mean.
4. Adding or removing a score from the distribution will change the mean UNLESS, the new or
removed score is equal to the mean.
 Example: If you have a set of scores, 1,2,3,4,5, the mean would be 3. If you remove the
score 3, you will only have 1,2,4,5 but the mean would still be 3. If you add a score, 3,
you will have 1,2,3,3,4,5 but the mean would still be 3.
5. Adding, subtracting, multiplying, and dividing the scores by a constant will make the mean
change the same way.
 Example: If you multiple all your scores, 1,2,3,4,5 by 3 then you will have 3,6,9,12,15. In
the original set, the mean is 3 and if you check in the new the set, the mean is 9 (the
original mean multiplied by 3).
The formula for the mean can be seen below.

Median
If the scores in a distribution are listed in order from smallest to largest, the median is the
midpoint of the list. It divides the distribution of scores into two equal parts. More specifically, the
median is the point on the measurement scale below which 50% of the scores in the distribution are
located. There are no specific symbols depicting the median but for easier identification, we will use Md.

Mode
The mode is the score or category that has the greatest frequency that is, the score that occurs or
appears the most. The word mode means “the customary fashion” or “a popular style.” There are no
symbols or special notation used to identify the mode or to differentiate between a sample mode and a
population mode. For easier understanding, we will use Mo to represent mode. The mode is a useful
measure of central tendency because it can be used to determine the typical or most frequent value for any
scale of measurement, including a nominal scale.
The mode also can be useful because it is the only measure of central tendency that corresponds
to an actual score in the data; by definition, the mode is the most frequently occurring score. The mean
and the median, on the other hand, are both calculated values and often produce an answer that does not
equal any score in the distribution.
In a frequency distribution graph, the greatest frequency will appear as the tallest part of the
figure. Although a distribution will have only one mean and only one median, it is possible to have more
than one mode. Specifically, it is possible to have two or more scores that have the same highest
frequency. In a frequency distribution graph, the different modes will correspond to distinct, equally high
peaks. A distribution with two modes is said to be bimodal, and a distribution with more than two modes
is called multimodal. Occasionally, a distribution with several equally high points is said to have no
mode.

Selecting a Measure of Central Tendency


You usually can compute two or even three measures of central tendency for the same set of data.
Although the three measures often produce similar results, there are situations in which they are
predictably different. Deciding which measure of central tendency is best to use depends on several
factors.
Whenever the scores are numerical values (interval or ratio scale) the mean is usually the
preferred measure of central tendency. Because the mean uses every score in the distribution, it typically
produces a good representative value. Remember that the goal of central tendency is to find the single
value that best represents the entire distribution. Also, the mean has the added advantage of being closely
related to variance and standard deviation, the most common measures of variability. This relationship
makes the mean a valuable measure for purposes of inferential statistics. For these reasons, and others, the
mean generally is the best of the three measures of central tendency. However, there are specific
situations in which it is impossible to compute a mean or in which the mean is not particularly
representative. It is in these situations that the mode and the median are used.
 When to Use the Median
o Extreme scores: When a distribution has a few extreme scores, scores that are very
different in value from most of the others, then the mean may not be a good
representative of most of the distribution. The problem comes from the fact that one or
two extreme values can have a large influence and cause the mean to be displaced. In this
situation, the fact that the mean uses all the scores equally can be a disadvantage. The
median, on the other hand, is not easily affected by extreme scores.
o Undetermined Values: It is impossible to compute for the mean because you have
unknown values. The median can still be acquired as it would divide the scores into
halves including the unknown value. The median for the sample below is 12.5 including
the undetermined or unknown value.

o Open-ended distributions: A distribution is said to be open-ended when there is no upper


limit (or lower limit) for one of the categories. The table below provides an example of
an open-ended distribution, showing the number of pizzas eaten during a 1-month period
for a sample of n = 20 high school students. The top category in this distribution shows
that three of the students consumed “5 or more” pizzas. This is an open-ended category.
Notice that it is impossible to compute a mean for these data because you cannot find ∑X
(the total number of pizzas for all 20 students). However, you can find the median.
Listing the 20 scores in order produces X = 1 and X= 2 as the middle two scores. For
these data, the median is 1.5.
o Ordinal Scale: When scores are measured on an ordinal scale, the median is always
appropriate and is usually the preferred measure of central tendency. The median is
compatible with this type of measurement because it is defined by direction: half of the
scores are above the median and half are below the median.

 When to Use the Mode


o Nominal Scales: The primary advantage of the mode is that it can be used to measure and
describe central tendency for data that are measured on a nominal scale. Recall that the
categories that make up a nominal scale are differentiated only by name, such as
classifying people by occupation or college major. Because nominal scales do not
measure quantity (distance or direction), it is impossible to compute a mean or a median
for data from a nominal scale. Therefore, the mode is the only option for describing
central tendency for nominal data.
o Discrete Variables: Recall that discrete variables are those that exist only in whole,
indivisible categories. Often, discrete variables are numerical values, such as the number
of children in a family or the number of rooms in a house. When these variables produce
numerical scores, it is possible to calculate means. However, the calculated means are
usually fractional values that cannot exist. For example, computing means will generate
results such as “the average family has 2.4 children and a house with 5.33 rooms.” The
mode, on the other hand, always identifies an actual score (the most typical case) and,
therefore, it produces more sensible measures of central tendency. Using the mode, our
conclusion would be “the typical, or modal, family has 2 children and a house with 5
rooms.” In many situations, especially with discrete variables, people are more
comfortable using the realistic, whole-number values produced by the mode.

Measures of Variability and Position

A. MEASURES OF VARIABILITY
The term variability has much the same meaning in statistics as it has in everyday language; to
say that things are variable means that they are not all the same. In statistics, our goal is to measure
the amount of variability for a particular set of scores, a distribution. In simple terms, if the scores in a
distribution are all the same, then there is no variability. If there are small differences between scores,
then the variability is small (homogenous), and if there are large differences between scores, then the
variability is large (heterogenous).
Variability provides a quantitative measure of the differences between scores in a distribution and
describes the degree to which the scores are spread out or clustered together.
The figure below shows two distributions of familiar values for the population of adult males:
part (a) shows the distribution of men’s heights (in inches), and part (b) shows the distribution of men’s
weights (in pounds). Notice that the two distributions differ in terms of central tendency. The mean height
is 70 inches (5 feet, 10 inches) and the mean weight is 170 pounds. In addition, notice that the
distributions differ in terms of variability. For example, most heights are clustered close together, within 5
or 6 inches of the mean. On the other hand, weights are spread over a much wider range. In the weight
distribution it is not unusual to find individuals who are located more than 30 pounds away from the
mean, and it would not be surprising to find two individuals whose weights differ by more than 30 or 40
pounds.

The purpose for measuring variability is to obtain an objective measure of how the scores are
spread out in a distribution. In general, a good measure of variability serves two purposes:
1. Variability describes the distribution. Specifically, it tells whether the scores are clustered close
together or are spread out over a large distance. Usually, variability is defined in terms of
distance. It tells how much distance to expect between one score and another, or how much
distance to expect between an individual score and the mean. For example, we know that the
heights for most adult males are clustered close together, within 5 or 6 inches of the average.
Although more extreme heights exist, they are relatively rare.
2. Variability measures how well an individual score (or group of scores) represents the entire
distribution. This aspect of variability is very important for inferential statistics, in which
relatively small samples are used to answer questions about populations. For example, suppose
that you selected a sample of one person to represent the entire population. Because most adult
males have heights that are within a few inches of the population average (the distances are
small), there is a very good chance that you would select someone whose height is within 6
inches of the population mean. On the other hand, the scores are much more spread out (greater
distances) in the distribution of weights. In this case, you probably would not obtain someone
whose weight was within 6 pounds of the population mean. Thus, variability provides
information about how much error to expect if you are using a sample to represent a population.
Range
Range is the distance covered by the scores in a distribution, from the smallest score to the largest
score.

Characteristics of the Range


The range is probably the most obvious way to describe how spread out the scores are—simply
find the distance between the maximum and the minimum scores. The problem with using the range as a
measure of variability is that it is completely determined by the two extreme values and ignores the other
scores in the distribution. Thus, a distribution with one unusually large (or small) score will have a large
range even if the other scores are all clustered close together. Because the range does not consider all the
scores in the distribution, it often does not give an accurate description of the variability for the entire
distribution. For this reason, the range is a crude and unreliable measure of variability.

Standard Deviation
The standard deviation is the most used and the most important measure of variability. Standard
deviation uses the mean of the distribution as a reference point and measures variability by considering
the distance between each score and the mean. In simple terms, the standard deviation provides a measure
of the standard, or average, distance from the mean, and describes whether the scores are clustered closely
around the mean or are widely scattered.
Characteristics of the Standard Deviation
The standard deviation has many important characteristics. First, the standard deviation gives us a
measure of dispersion relative to the mean. This differs from the range, which gives us an absolute
measure of the spread between the two most extreme scores. Second, the standard deviation is sensitive to
each score in the distribution. If a score is moved closer to the mean, then the standard deviation will
become smaller. Conversely, if a score shifts away from the mean, then the standard deviation will
increase. Third, like the mean, the standard deviation is stable with regard to sampling fluctuations. If
samples were taken repeatedly from populations of the type usually encountered in the behavioral
sciences, the standard deviation of the samples would vary much less from sample to sample than the
range. This property is one of the main reasons why the standard deviation is used so much more often
than the range for reporting variability.

Variance
The variance of a set of scores is just the square of the standard deviation or the average squared
distance from the mean.

Characteristics of the Variance


The variance is not used much in descriptive statistics because it gives us squared units of
measurement. However, it is used quite frequently in inferential statistics.

B. MEASURES OF POSITION
It determines the standing or position of a single value (score) in relation to other values or scores
in a sample or population data set. Measures of position give us a way to see where a certain data point or
value falls in a sample or distribution. A measure can tell us whether a value is about the average, or
whether it’s unusually high or low. Measures of position are used for quantitative data that falls on some
numerical scale. Sometimes, measures can be applied to ordinal variables— those variables that have an
order, like first, second…fiftieth.

Quartile
A distribution of test scores or data can be divided into four parts such that 25% of the test scores
occur in each quarter. Thus, the first quartile cuts off the lowest 25%, the second quartile cuts off the
lowest 50%, and the third quartile cuts off the lowest 75%. (Note that the second quartile is also the
median.). The quartiles/quartile points represent the dividing points between the four quarters in the
distribution. There are three of them, respectively labeled Q1, Q2, and Q3. To differentiate, quartile refers
to a specific point whereas quarter refers to an interval. An individual score may, for example, fall at the
third quartile or in the third quarter (but not “in” the third quartile or “at” the third quarter).
Simply put, quartiles divide your data into quarters: the lowest quarter, two middle quarters, and a
highest quarter.

Percentile
A percentile is a ranking that conveys information about the relative position of a score within a
distribution of scores. More formally defined, a percentile is an expression of the percentage of people
whose score on a test or measure falls below a particular raw score. Percentile is used extensively in
education to compare the performance of an individual to that of a reference group.
Percentile is a measure that divides the distribution into 100 equal parts. Further, it is the value on the
measurement scale below which a specified percentage of the scores in the distribution fall. For example,
if 80th percentile (P80) = 65, then 80% of all examinees scored below 65.
Percentile is also used as a standard score. A suggested guide for interpretation can be seen
below.
PERCENTILE INTERPRETATION
98-99 Very Superior
93-97 Superior
75-92 Above Average
25-74 Average
9-24 Below Average
2-8 Borderline
1 Impaired (Mentally Deficient)

Correlation

I. Definition of Correlation
Correlation is a statistical technique that is used to measure and describe the relationship
between two variables. Usually, the two variables are simply observed as they exist naturally in the
environment—there is no attempt to control or manipulate the variables. For example, a researcher could
check high school records (with permission) to obtain a measure of each student’s academic performance,
and then survey each family to obtain a measure of income. The resulting data could be used to determine
whether there is relationship between high school grades and family income. Notice that the researcher is
not manipulating any student’s grade or any family’s income but is simply observing what occurs
naturally. You also should notice that a correlation requires two scores for each individual (one score
from each of the two variables). These scores normally are identified as X and Y. The pairs of scores can
be listed in a table, or they can be presented graphically in a scatter plot. In the scatter plot, the values for
the X variable are listed on the horizontal axis and the Y values are listed on the vertical axis. Everyone is
represented by a single point in the graph so that the horizontal position corresponds to the individual’s X
value and the vertical position corresponds to the Y value. The value of a scatter plot is that it allows you
to see any patterns or trends that exist in the data. The scores in the figure below, for example, show a
clear relationship between family income and student grades: as income increases, grades also increase.

II. Characteristics of a Relationship


A powerful graphical technique for demonstrating correlational relationships is the Scatterplot.
This plots pairs of values, one for variable X, the other for variable Y, on a chart, thus
demonstrating the scattering of paired values. The extent to which pairs of readings are not scattered
randomly on the diagram, but do form a consistent pattern, is a sign of the strength of the relationship.
The scatterplots found below will demonstrate this. They represent fictitious data from one person taken
after each trial on a simulated driving task. The trials represent an operational measure of practice – the
greater the practice, the better the performance and the shorter the time taken. However, amount of
practice is not related to amount spoken at all.

A correlation is a numerical value that describes and measures three characteristics of the relationship
between X and Y. These three characteristics are as follows

1. Direction of the Relationship


The sign of the correlation, positive or negative, describes the direction of the
relationship. In a positive/direct correlation, the two variables tend to change in the same
direction: as the value of the X variable increases from one individual to another, the Y variable
also tends to increase; when the X variable decreases, the Y variable also decreases. In a negative
correlation, the two variables tend to go in opposite directions. As the X variable increases, the
Y variable decreases. That is, it is an inverse relationship.
The two scatterplots below provide examples of positive and negative relationships. For
the men, there is a positive relationship between weight and income, with income increasing as
weight increases from person to person. For the women, the relationship is negative, with income
decreasing as weight increases.
2. Form of the Relationship
In the preceding salary and weight examples, the relationships tend to have a linear form;
that is, the points in the scatter plot tend to cluster around a straight line. We have drawn a line
through the middle of the data points in each figure to help show the relationship. The most
common use of correlation is to measure straight-line relationships (Linear: figures a & b).
However, other forms of relationships do exist and there are special correlations used to measure
them (Non-linear: figure d & e).

3. Strength/Magnitude/Consistency of the Relationship


The correlation measures the consistency of the relationship. For a linear relationship, for
example, the data points could fit perfectly on a straight line. Every time X increases by one point, the
value of Y also changes by a consistent and predictable amount. The consistency of the relationship is
measured by the numerical value of the correlation. A perfect correlation always is identified by a
correlation of 1.00 and indicates a perfectly consistent relationship. For a correlation of 1.00 (or –
1.00), each change in X is accompanied by a perfectly predictable change in Y. At the other extreme,
a correlation of 0 indicates no consistency at all. For a correlation of 0, the data points are scattered
randomly with no clear trend, just like figure b below. Intermediate values between 0 and 1 indicate
the degree of consistency.

Estimates for Strength of Correlation (Sevilla, 1994)


Correlation Strength
Coefficient of Relationship
±1.00 Perfect correlation
±0.80 – ±0.99 Very strong relationship
±0.60 - ±0.79 Strong relationship
±0.40 - ±0.59 Moderate relationship
±0.20 - ±0.39 Weak relationship
±0.00 – ±0.19 No relationship

TAKE NOTE:
The sign (+ or –) and the strength of a correlation are independent. For example, a correlation of 1.00
indicates a perfectly consistent relationship whether it is positive (+1.00) or negative (–1.00). Similarly,
correlations of +0.80 and –0.80 are equally consistent relationships. Finally, you should notice that a
correlation can never be larger than +1.00 or smaller than –1.00. If your calculations produce a value
outside this range, then you should realize immediately that you have made a computational error.

III. Pearson Product-Moment Correlation


The most common correlation is the Pearson correlation (or the Pearson product–moment
correlation), which measures the degree of straight-line relationship. The Pearson correlation measures
the degree and the direction of the linear relationship between two variables. The Pearson correlation for a
sample is identified by the letter r. The corresponding correlation for the entire population is identified
by the Greek letter rho (ρ), which is the Greek equivalent of the letter r.
When there is a perfect linear relationship, every change in the X variable is accompanied by a
corresponding change in the Y variable. In figure a, for example, every time the value of X increases,
there is a perfectly predictable decrease in the value of Y. The result is a perfect linear relationship, with
X and Y always varying together. In this case, the covariability (X and Y together) is identical to the
variability of X and Y separately, and the formula produces a correlation with a magnitude of 1.00 or –
1.00. At the other extreme (figure c), when there is no linear relationship, a change in the X variable does
not correspond to any predictable change in the Y variable. In this case, there is no covariability, and the
resulting correlation is zero.
Pearson’s correlation is a statistical procedure that measures the degree of relationship between
two variables. It is used when both variables are interval or ratio scale of measurement, and the data
should be drawn from near-normal distributions.

IV. Where and Why Correlation is Used


1. Prediction
If two variables are known to be related in some systematic way, it is possible to use one of the
variables to make accurate predictions about the other. For example, when you applied for admission to
college, you were required to submit a great deal of personal information, including your scores on the
Scholastic Achievement Test (SAT). College officials want this information so they can predict your
chances of success in college. It has been demonstrated over several years that SAT scores and college
grade point averages are correlated. Students who do well on the SAT tend to do well in college; students
who have difficulty with the SAT tend to have difficulty in college. Based on this relationship, college
admissions officers can make a prediction about the potential success of each applicant. You should note
that this prediction is not perfectly accurate. Not everyone who does poorly on the SAT will have trouble
in college. That is why you also submit letters of recommendation, high school grades, and other
information with your application. The process of using relationships to make predictions is called
regression and is discussed in the next chapter.

2. Validity
Suppose a psychologist develops a new test for measuring intelligence. How could you show that this
test truly measures what it claims; that is, how could you demonstrate the validity of the test? One
common technique for demonstrating validity is to use a correlation. If the test actually measures
intelligence, then the scores on the test should be related to other measures of intelligence—for example,
standardized IQ tests, performance on learning tasks, problem-solving ability, and so on. The
psychologist could measure the correlation between the new test and each of these other measures of
intelligence to demonstrate that the new test is valid.

3. Reliability
In addition to evaluating the validity of a measurement procedure, correlations are used to determine
reliability. A measurement procedure is considered reliable to the extent that it produces stable, consistent
measurements. That is, a reliable measurement procedure will produce the same (or nearly the same)
scores when the same individuals are measured twice under the same conditions. For example, if your IQ
were measured as 113 last week, you would expect to obtain nearly the same score if your IQ were
measured again this week. One way to evaluate reliability is to use correlations to determine the
relationship between two sets of measurements. When reliability is high, the correlation between two
measurements should be strong and positive.

4. Theory Verification
Many psychological theories make specific predictions about the relationship between two variables.
For example, a theory may predict a relationship between brain size and learning ability; a developmental
theory may predict a relationship between the parents’ IQs and the child’s IQ; a social psychologist may
have a theory predicting a relationship between personality type and behavior in a social situation. In each
case, the prediction of the theory could be tested by determining the correlation between the two
variables.

V. Considerations in Interpreting Correlations


1) Correlation simply describes a relationship between two variables. It does not explain why the
two variables are related. Specifically, a correlation should not and cannot be interpreted as proof
of a cause-and-effect relationship between the two variables.
2) The value of a correlation can be affected greatly by the range of scores represented in the data.
3) One or two extreme data points, often called outliers, can have a dramatic effect on the value of a
correlation.
4) When judging how “good” a relationship is, it is tempting to focus on the numerical value of the
correlation. For example, a correlation of +0.5 is halfway between 0 and 1.00 and therefore
appears to represent a moderate degree of relationship. However, a correlation should not be
interpreted as a proportion. Although a correlation of 1.00 does mean that there is a 100%
perfectly predictable relationship between X and Y, a correlation of .5 does not mean that you can
make predictions with 50% accuracy. To describe how accurately one variable predicts the other,
you must square the correlation. Thus, a correlation of r = .5 means that one variable partially
predicts the other, but the predictable portion is only r2 = .52 = 0.25 (or 25%) of the total
variability.

VI. Spearman Rank-Order Correlation


There are times when the data we are using cannot be considered as valid interval-level data
and/or one of the variables might be very skewed. In these cases we can use Spearman’s rho, a non-
parametric measure of correlation, which works with ordinal level data. Spearman’s rho is the Pearson
correlation coefficient that would occur if we used the ranks of the scores on the two paired variables as
the data, rather than the scores themselves. Normally, for a strong correlation we expect that each person
will score much the same value on one variable as they do on the other. It makes sense to say also that
each person’s position (i.e., their rank) on one variable will be similar to their position on the other. If
your first psychology essay received the third highest mark in the class, you might expect that your
second essay mark will also be close to third highest.
The rationale for the Spearman correlation coefficient is that differences between all pairs of
ranks will be small when there is a high positive correlation. If a class of psychology students took two
class tests on one day, both tests based on the same material that should have been learned, then we would
expect that people who have the higher ranks on one test should also have a high rank on the other test;
people with a low rank on one should also receive a low rank on the other. This, then, means that the
difference between each person’s pair of ranks should be low or even zero, if there is a strong correlation.

Regression
Regression and correlation are closely related. At the most basic level, they both involve the
relationship between two variables, and they both utilize the same set of basic data: paired scores taken
from the same or matched subjects. As we saw in Lesson 6, correlation is concerned with the magnitude
and direction of the relationship. Regression focuses on using the relationship for prediction. Prediction is
quite easy when the relationship is perfect. If the relationship is perfect, all the points fall on a straight
line and all we need do is derive the equation of the straight line and use it for prediction. As you might
guess, when the relationship is perfect, so is prediction. All predicted values are exactly equal to the
observed values and prediction error equals zero. The situation is more complicated when the relationship
is imperfect.
When the relationship is imperfect, it can be hard to determine a line that best fits the data. Look
at the table and scatterplot below. The relationship is imperfect, positive, and linear. If you trace lines
from the scatterplot, there can be many different lines based on connecting the points. This is the problem
we face for prediction which is how to determine the single straight line that best describes the data. The
solution most often used is to construct the line that minimizes errors of prediction according to a least-
squares criterion. Appropriately, this line is called the least-squares regression line. To summarize,
regression is a technique that uses the relationship between two variables for prediction. The least-squares
regression line is the prediction line that minimizes the total error of prediction.

You should first be familiar with the least-squares regression line which is:
Y' = bYX + aY
where:
Y' = predicted or estimated value of Y
bY = slope of the line for minimizing errors in predicting Y
aY = axis intercept for minimizing errors in predicting Y

You might also like