Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

How to Do Reliability Analysis and Basic Factor Analysis in R

David Condon
Northwestern University

In this document, we are going to use R to do a reliability analysis, a basic factor


analysis, and an IRT-based factor analysis. At the time of this writing, I’m using R version
2.13.0 on a Mac (OS X 10.6.8). You will also need to have the ‘psych’ package loaded
in R as we’ll be using many of those commands. I’m currently running ‘psych’ version 1.1.09.

Getting Data from Excel into R


As is often the case with R, the hardest part of this assignment will probably be
getting the data ready for analysis. In this example, I’ll be importing data from an Excel
file into R - a task that is frequently annoying for R users. The most common issue in my
experience stems from the fact that R will not recognize some aspect of the way the data
is formatted in Excel. When I try to paste in data (using the clipboard), I often get an
error message in R that looks something like this:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 1 did not have 859 elements

The data file I happen to be working with in this example has missing data that is coded
as #NULL!, which is not an acceptable format for missing data in R. So this will have to
be fixed before we can move on. There are many possible ways to do this. I chose to deal
with it directly in Excel. All my data is stored in the first sheet of the Excel file (‘Sheet1’),
so I clicked over to Sheet2 and wrote this formula in cell A1:

=IF(‘Sheet1’ !A1=“#NULL!”,“NA”,‘Sheet1’ !A1)

This pulls the value from Sheet1 into the same location in Sheet2 and only changes the cell
if it has missing data. You have to copy and paste this formula into all of the relevant cells
until Sheet2 looks the same as Sheet1 (except for the missing data changes).

Next we need to save the data as a file in a different format. We want to use the .csv
file extension instead of the .xls or .xlsx defaults (csv stands for ‘comma separated values’).
Excel won’t let you save in the .csv format if you have data on more than one sheet. So...
select all the data on Sheet2, copy it to the clipboard (Ctrl+C or use the copy button in
Excel), open a new workbook in Excel and use the ‘Paste Special...’/‘Values’ command to
OCTOBER 16, 2011 2

paste all the data into the first sheet of the new workbook. The ‘Paste Special...’ option is
accessed through the ‘Edit’ drop-down menu in Excel. You have to click the ‘values’ button
from the menu that pops up and then hit enter. (This is an incredibly useful tool in Excel
by the way). Now you can save the new workbook as a .csv file by choosing that option
from the window that pops up when you select save (or hit Ctrl+S).
Now we’re finally ready to paste the data into R. Open R or toggle over to it if it’s
already open. If the ‘psych’ package is not already loaded, use this command to do it:
> library(psych)
Now you need to select all the data in the .csv file that you just created and copy it to the
clipboard (note that you can do this by opening the file in Excel but you will probably have
better luck if you open it with some other program that works with .csv files, especially
if you’re working with large amounts of data - I recommend TextEdit or Notepad). With
the data on the clipboard, toggle back to R and paste the data by typing in the following
command (if you copy it from this page you will obviously write over the data on the
clipboard):
> ipip <- read.clipboard.csv()
In the command above, ‘ipip’ is just the name we gave the data. You can call it whatever
you want (more or less - you’ll get an error message if you use numbers or the names of
functions in R). You’ll know you imported the data if R doesn’t complain. If you actually
want to check that it worked, you’ll have to ask R to tell you. The ‘dim’ command shows
you how many rows and columns you have in the data. Check that the number of columns
and rows in the Excel file matches the data in R by using ‘dim()’ now:
> dim(ipip)
[1] 795 859
Easy, right? You should probably save the data in an ‘rdata’ file so you don’t have
to go through that ordeal again. Use a command like the one below. Be sure to change the
path after ‘file’ to suit the directory on your computer. Also, be sure that the file ends with
the extension ‘.rdata’.
> save(ipip, file=``/Users/DC/DC stuff/academics/2011-2012 3rd year/Fall 2011/
+ ipip.rdata")
When you want to open the file later, you can do so by using the load command. We don’t
need it now, but it would look like this:
> load(``/Users/DC/DC stuff/academics/2011-2012 3rd year/Fall 2011/ipip.rdata")

Cleaning Up the Data


In this example, only a couple of steps are needed to ‘clean up’ the data. It is more
common that many steps are needed. See the other how-to’s on this website if you need
ideas or help.
The first thing to do is examine the data. There are many ways to do this. One
method is to use the ‘describe’ command. Try something like:
OCTOBER 16, 2011 3

> describe(ipip)
You’ll get a table of descriptive statistics that will help to identify “weirdness” in the data
set. Be sure to check that the ranges look right for all rows. Fix whatever weirdness needs
fixing by cleaning the data before you proceed.
In our case, the data looks OK but we’re only interested in a subset of the 859
variables. In our data, each column represents a different item that was administered in a
self-report personality questionnaire and our assignment is to develop a scale of items that
can be used to assess a specific construct. We have already identified the item numbers we
want to consider for our scale based on theory, so we need to create a list of these items
according to their column name. Let’s call the list ‘keep’.
> keep <- c("h648", "h356", "h985", "h14", "h1", "h914", "h738", "h682",
+ "h672", "h864", "h660", "h1157", "h1174", "h271", "h2014", "h905",
+ "h1366", "h33", "h592", "h1393", "h41", "h589", "h655", "h2004",
+ "h598", "h218", "h1346", "h965", "h2013", "h897", "h154", "h974", "h24")
Now we’ll use this list to pull out the columns we want and call the smaller data set by a
new name. Let’s use something that relates to our proposed construct - avoidant personality
disorder. And we should check that the dimensions of the smaller set look right - it should
have the same number of rows as before but fewer columns.
> avpd <- ipip[, keep]
> dim(avpd)
[1] 795 33

Reliability Analysis, Factor Analysis & IRT-based Factor Analysis


The generic reliability analysis is very straight-forward. The command is given
below. If you need help making sense of the output, use the help page for the function (by
entering the command ‘?alpha’ in R).
> ra.avpd <- alpha(avpd)
> ra.avpd
Factor analysis is easy too. If you don’t want to use IRT, you can run a simple
factor analysis on the original data matrix. Here we only call for one factor. In most cases,
we would evaluate the factor structure by comparing the output when we ask for different
numbers of factors (among other things). But we’re not gonna get into all that here...
> avpd.1 <- fa(avpd, nfactors = 1)
> avpd.1
We can also do an IRT-based factor analysis. The irt.fa function will first find the
tetrachoric/polychoric matrix based on the original data frame (this will take a while - if
you have a lot of data and/or a slow machine, it could take a long while). Then it will use
that matrix to perform the factor analysis. Here again, it would almost always make sense
to consider the possibility that the data can be better explained by two or more factors but
we aren’t going to do that now.
OCTOBER 16, 2011 4

> avpd.irt1 <- irt.fa(avpd, fm = "gls")


> avpd.irt1

Finally, we’re going to save our work again so we can run more analyses or look at
the output later if we want. The command is the same as the one we used before but we
have to add the additional objects we want to save.

save(ipip, ra.avpd, avpd.1, avpd.irt1, file=``/Users/DC/DC stuff/academics/


+ 2011-2012 3rd year/Fall 2011/ipip.rdata")

The end.

You might also like