Using R With Multivariate Statistics by Randall E. Schumacker

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 471

Using R With Multivariate Statistics

2
To Joanne

For all her love and support while writing the book.

3
Using R With Multivariate Statistics

Randall E. Schumacker
University of Alabama

4
FOR INFORMATION:

SAGE Publications, Inc.

2455 Teller Road

Thousand Oaks, California 91320

E-mail: order@sagepub.com

SAGE Publications Ltd.

1 Oliver’s Yard

55 City Road

London EC1Y 1SP

United Kingdom

SAGE Publications India Pvt. Ltd.

B 1/I 1 Mohan Cooperative Industrial Area

Mathura Road, New Delhi 110 044

India

SAGE Publications Asia-Pacific Pte. Ltd.

3 Church Street

#10-04 Samsung Hub

Singapore 049483

Copyright © 2016 by SAGE Publications, Inc.

All rights reserved. No part of this book may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying, recording, or by any information storage and retrieval
system, without permission in writing from the publisher.

All trademarks depicted within this book, including trademarks appearing as part of a screenshot, figure, or
other image are included solely for the purpose of illustration and are the property of their respective holders.
The use of the trademarks in no way indicates any relationship with, or endorsement by, the holders of said
trademarks. SPSS is a registered trademark of International Business Machines Corporation.
Printed in the United States of America

Library of Congress Cataloging-in-Publication Data

Schumacker, Randall E.

Using R with multivariate statistics : a primer / Randall E. Schumacker, University of Alabama, Tuscaloosa.

pages cm

Includes bibliographical references and index.

ISBN 978-1-4833-7796-4 (pbk. : alk. paper)

1. Multivariate analysis—Data processing. 2. R (Computer program language) 3. Statistics—Data processing. I. Title.

QA278.S37 2016

519.5’3502855133—dc23   2015011814

This book is printed on acid-free paper.

5
Acquisitions Editor: Vicki Knight

Editorial Assistant: Yvonne McDuffee

eLearning Editor: Katie Bierach

Production Editor: Kelly DeRosa

Copy Editor: QuADS Prepress (P) Ltd.

Typesetter: C&M Digitals (P) Ltd.

Proofreader: Jennifer Grubba

Indexer: Michael Ferreira

Cover Designer: Michelle Kenny

Marketing Manager: Nicole Elliott

6
Detailed Contents

7
8
Detailed Contents
Preface
Acknowledgments
About the Author
1. Introduction and Overview
Background
Persons of Interest
Factors Affecting Statistics
R Software
Web Resources
References
2. Multivariate Statistics: Issues and Assumptions
Issues
Assumptions
Normality
Determinant of a Matrix
Equality of Variance–Covariance Matrix
Box M Test
SPSS Check
Summary
Web Resources
References
3. Hotelling’s T2: A Two-Group Multivariate Analysis
Overview
Assumptions
Univariate Versus Multivariate Hypothesis
Statistical Significance
Practical Examples Using R
Single Sample
Two Independent Group Mean Difference
Two Groups (Paired) Dependent Variable Mean Difference
Power and Effect Size
A Priori Power Estimation
Effect Size Measures
Reporting and Interpreting
Summary
Exercises
Web Resources

9
References
4. Multivariate Analysis of Variance
MANOVA Assumptions
Independent Observations
Normality
Equal Variance–Covariance Matrices
Summary
MANOVA Example: One-Way Design
MANOVA Example: Factorial Design
Effect Size
Reporting and Interpreting
Summary
Exercises
Web Resources
References
5. Multivariate Analysis of Covariance
Assumptions
Multivariate Analysis of Covariance
MANCOVA Example
Dependent Variable: Adjusted Means
Reporting and Interpreting
Propensity Score Matching
Summary
Web Resources
References
6. Multivariate Repeated Measures
Assumptions
Advantages of Repeated Measure Design
Multivariate Repeated Measure Examples
Single Dependent Variable
Several Dependent Variables: Profile Analysis
Doubly Multivariate Repeated Measures
Reporting and Interpreting Results
Summary
Exercises
Web Resources
References
7. Discriminant Analysis
Overview
Assumptions

10
Dichotomous Dependent Variable
Box M Test
Classification Summary
Chi-Square Test
Polytomous Dependent Variable
Box M Test
Classification Summary
Chi-Square Test
Effect Size
Reporting and Interpreting
Summary
Exercises
Web Resources
References
8. Canonical Correlation
Overview
Assumptions
R Packages
CCA Package
yacca Package
Canonical Correlation Example
Effect Size
Reporting and Interpreting
Summary
Exercises
Web Resources
References
9. Exploratory Factor Analysis
Overview
Types of Factor Analysis
Assumptions
Factor Analysis Versus Principal Components Analysis
EFA Example
R Packages
Data Set Input
Sample Size Adequacy
Number of Factors and Factor Loadings
Factor Rotation and Extraction: Orthogonal Versus Oblique Factors
Factor Scores
Graphical Display

11
Reporting and Interpreting
Summary
Exercises
Web Resources
References
Appendix: Attitudes Toward Educational Research Scale
10. Principal Components Analysis
Overview
Assumptions
Bartlett Test (Sphericity)
KMO Test (Sampling Adequacy)
Determinant of Correlation Matrix
Basics of Principal Components Analysis
Principal Component Scores
Principal Component Example
R Packages
Data Set
Assumptions
Number of Components
Reporting and Interpreting
Summary
Exercises
Web Resources
References
11. Multidimensional Scaling
Overview
Assumptions
Proximity Matrix
MDS Model
MDS Analysis
Sample Size
Variable Scaling
Number of Dimensions
R Packages
Goodness-of-Fit Index
MDS Metric Example
MDS Nonmetric Example
Reporting and Interpreting Results
Summary
Exercises

12
Web Resources
References
12. Structural Equation Modeling
Overview
Assumptions
Multivariate Normality
Positive Definite Matrix
Equal Variance–Covariance Matrices
Correlation Versus Covariance Matrix
Basic Correlation and Covariance Functions
Matrix Input Functions
Reference Scaling in SEM Models
R Packages
Finding R Packages and Functions
SEM Packages
CFA Models
Basic Model
Multiple Group Model
Structural Equation Models
Basic SEM Model
Longitudinal SEM Models
Reporting and Interpreting Results
Summary
Exercises
Web Resources
References
Statistical Tables
Table 1: Areas Under the Normal Curve (z Scores)
Table 2: Distribution of t for Given Probability Levels
Table 3: Distribution of r for Given Probability Levels
Table 4: Distribution of Chi-Square for Given Probability Levels
Table 5: The F Distribution for Given Probability Levels (.05 Level)
Table 6: The Distribution of F for Given Probability Levels (.01 Level)
Table 7: Distribution of Hartley F for Given Probability Levels
Chapter Answers
R Installation and Usage
R Packages, Functions, Data Sets, and Script Files
Index

13
14
15
Preface

The book Using R With Multivariate Statistics was written to supplement existing full textbooks on the various
multivariate statistical methods. The multivariate statistics books provide a more in-depth coverage of the
methods presented in this book, but without the use of R software. The R code is provided for some of the
data set examples in the multivariate statistics books listed below. It is hoped that students can run the
examples in R and compare results in the books that used SAS, IBM® SPSS® Statistics*, or STATA statistics
packages. The advantage of R is that it is free and runs on Windows, Mac, and LINUX operating systems.

The full textbooks also provide a more in-depth discussion of the assumptions and issues, as well as provide
data analysis and interpretation of the results using SPSS, SAS, and/or STATA. The several multivariate
statistics books I consulted and referenced are as follows:

Afifi, A., Clark, V., & May, S. (2004). Computer-aided multivariate analysis (4th ed.). Boca Raton, FL:
Chapman & Hall/CRC Press.
Hair, J. F., Jr., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th
ed.). Upper Saddle River, NJ: Prentice Hall.
Meyers, L. S., Gamst, G., & Guarino, A. J. (2013). Applied multivariate research: Design and
interpretation (2nd ed.). Thousand Oaks, CA: Sage.
Raykov, T., & Marcoulides, G. A. (2008). An introduction to applied multivariate analysis. New York,
NY: Routledge (Taylor & Francis Group).
Stevens, S. S. (2009). Applied multivariate statistics for the social sciences (5th ed.). New York, NY:
Routledge (Taylor & Francis Group).
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston, MA: Allyn &
Bacon.

This book was written to provide researchers with access to the free R software when conducting multivariate
statistical analysis. There are many packages and functions available, which can be overwhelming, so I have
collected some of the widely used packages and functions for the multivariate methods in the book. Many of
the popular multivariate statistics books will provide a more complete treatment of the topics covered in this
book along with SAS and/or SPSS solutions. I am hopeful that this book will provide a good supplemental
coverage of topics in multivariate books and permit faculty and students to run R software analyses. The R
software permits the end users to customize programs to provide the type of analysis and output they desire.
The R commands can be saved in a script file for future use, can be readily shared, and can provide the user
control over the analytic steps and algorithms used. The advantages of using R software are many, including
the following:

Free software
The ability to customize statistical analysis
Control over analytic steps and algorithms used

16
Available on Window, Mac, and Linux operating systems
Multitude of packages and functions to conduct analytics
Documentation and reference guides available

17
Data Sets
The multivariate textbooks listed above have numerous examples and data sets available either in their book or
on the publishers’ website. There are also numerous data sets available for statistical analysis in R, which can
be viewed by using the following R command(s):

or, you can also enter the following URL to obtain a list:

http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html

The type of data set we would generally want is one that contained a set of continuous dependent variables
and a set of continuous independent variables. The correlation of the two linear sets of variables is the basis for
conducting many of the multivariate statistics covered in the book.

The input and use of the data sets are generally provided with a brief explanation and example in R code.
Overall, the use of the data sets can be enhanced by taking the time to study an R tutorial located at

http://ww2.coastal.edu/kingw/statistics/R-tutorials/dataframes.html

The following R commands are helpful in understanding the data set, where the data set name is specified for
each function; in this example, iris.

18
Input Data Files
There are many ways to input data files, depending on how the data are coded (Schumacker, 2014). You may
wish to use Notepad to initially view a data file. Commercial software packages have their own format (SPSS:
*.sav; SAS: *.sas; EXCEL: *.xls; etc.). A data file may be formatted with commas between the data values,
semicolons, a tab, or a space. Each file type requires specifying the separation type between data values using
the sep() argument in one of the following R functions that reads the data file:

The separation types in the sep() argument are as follows:

You can find out more about reading in data files with different separation types using >?read.table.

A useful approach for finding and reading data sets on your computer is to embed the file.choose() function.
This opens a dialog window and permits a search of your folders for the data set. Click on the data set, and it
is read into the file. The R command would be as follows:

This command would find a data file with variable names on the first line (header = TRUE) and a space
between the data values.

Many statistical methods use a correlation or covariance matrix. Some use a partial correlation or partial
covariance matrix. The correlation and covariance matrices are computed by using the following commands,
respectively:

The corpcor package has two useful functions that permit conversion in either direction from correlation to
partial correlation; or partial correlation to correlation. This also applies to covariance matrices; in this
example the matrix is mymatrix.

A chi-square test of whether two correlation matrices are equal is conducted using the following R commands.

Also, this function permits testing whether a single correlation matrix is an identity matrix.

19
You will find these functions very useful when running multivariate statistical analyses.

20
R Packages
The multivariate statistical analyses require the use of certain R packages. In the appendix, for each chapter, I
have compiled a list of the R packages, functions, data sets, and R script files I used to conduct the analyses.
This should provide a handy reference guide. You can also obtain a list of packages by

Information about a specific R package can be obtained by

I recommend using the options in the pull-down menu whenever possible. The options include installing,
loading, and updating packages. You can also issue individual commands for these options:

You may receive a notice that a particular package runs under a certain version of R. When this occurs, simply
uninstall your current version of R in the Control Panel, and then install the newer version of R from the
website (http://www.r-project.org/).

There are two very important additions to the R software package. After installing R, either of these can make
your use of R much easier, especially in organizing files and packages. The two software products are
RCommander and RStudio. You will need to decide which one fits your needs. These are considered graphical
user interfaces, which means they come with pull-down menus and dialog windows displaying various types of
information. They can be downloaded from the following websites:

> http://www.rcommander.com/
> http://www.rstudio.com/

*SPSS is a registered trademark of International Business Machines Corporation.

21
22
Acknowledgments

The photographs of eminent statisticians who influenced the field of multivariate statistics were given by
living individuals and/or common sources on the Internet. The biographies were a compilation of excerpts
from common source Internet materials, comments in various textbooks, flyers, and conference pamphlets. I
would like to suggest sources for additional information about eminent statisticians that may be of interest to
scholars and academicians. First, Wikipedia (http://www.wikipedia.org/), which provides contributed
information on individuals in many different languages around the globe, and their list of many founders of
statistics (http://en.wikipedia.org/wiki/Founders_of_statistics). The American Statistical Association
(www.amstat.org) supports a website with biographies and links to many other statistical societies. The World
of Statistics (www.worldofstatistics.org) provides a website with famous statisticians’ biographies and/or links
to reference sources. A list of famous statisticians can be found on Wikipedia
(http://en.wikipedia.org/wiki/List_of_statisticians). Simply Google and you will find websites about famous
statisticians. Any errors or omissions in the biographies are unintentional, and in the purview of my
responsibilities, not the publisher’s.

SAGE Publications would like to thank the following reviewers:

Xiaofen Keating, The University of Texas at Austin


Richard Feinn, Southern Connecticut State University
James Alan Fox, Northeastern University
Thomas H. Short, John Carroll University
Jianmin Guan, University of Texas at San Antonio
Edward D. Gailey, Fairmont State University
Prathiba Natesan, University of North Texas
David E. Drew, Claremont Graduate University
Camille L. Bryant, Columbus State University
Darrell Rudmann, Shawnee State University
Jann W. MacInnes, University of Florida
Tamara A. Hamai, California State University, Dominguez Hills
Weihua Fan, University of Houston

23
About the Author

Randall E. Schumacker
is Professor of Educational Research at The University of Alabama. He has written and coedited several
books, including A Beginner’s Guide to Structural Equation Modeling (4th ed.), Advanced Structural
Equation Modeling: Issues and Techniques, Interaction and Non-Linear Effects in Structural Equation
Modeling, New Developments and Techniques in Structural Equation Modeling, Understanding Statistical
Concepts Using S-PLUS, Understanding Statistics Using R, and Learning Statistics Using R.
He was the founder and is now Emeritus Editor of Structural Equation Modeling: A Multidisciplinary
Journal, and he established the Structural Equation Modeling Special Interest Group within the
American Educational Research Association. He is also the Emeritus Editor of Multiple Linear
Regression Viewpoints, the oldest journal sponsored by the American Educational Research Association
(Multiple Linear Regression: General Linear Model Special Interest Group).
He has conducted international and national workshops, has served on the editorial board of several
journals, and currently pursues his research interests in measurement, statistics, and structural equation
modeling. He was the 1996 recipient of the Outstanding Scholar Award and the 1998 recipient of the
Charn Oswachoke International Award. In 2010, he launched the DecisionKit App for the iPhone,
iPad, and iTouch, which can assist researchers in making decisions about which measurement, research
design, or statistic to use in their research projects. In 2011, he received the Apple iPad Award, and in
2012, he received the CIT Faculty Technology Award at the University of Alabama. In 2013, he
received the McCrory Faculty Excellence in Research Award from the College of Education at the
University of Alabama. In 2014, he was the recipient of the Structural Equation Modeling Service
Award at the American Educational Research Association.

24
1 Introduction and Overview

Background
Persons of Interest
Factors Affecting Statistics
R Software
Web Resources
References

25
Background
Multivariate statistics can be described as containing two distinct methods: dependent and interdependent.
Dependent methods designate certain variables as dependent measures with the others treated as independent
variables. Multivariate dependent methods are associated with regression, analysis of variance (ANOVA),
multivariate analysis of variance (MANOVA), discriminant, and canonical analyses. Multivariate
interdependent methods are associated with factor, cluster, and multidimensional scaling analyses where no
dependent variable is designated. Interdependent methods search for underlying patterns of relations among
the variables of interest. Another characterization is to study multivariate statistics as two distinct approaches,
one that tests for mean differences and another that analyzes correlation/covariance among variables. This
book will present these two types of multivariate methods using R functions.

26
Persons of Interest
The book takes a unique perspective in learning multivariate statistics by presenting information about the
individuals who developed the statistics, their background, and how they influenced the field. These
biographies about the past noteworthy persons in the field of statistics should help you understand how they
were solving real problems in their day. The introduction of each chapter therefore provides a brief biography
of a person or persons who either developed the multivariate statistic or played a major role in its use.

27
Factors Affecting Statistics
An important concept in the field of statistics is data variability. Dating back to 1894, Sir Ronald Fisher and
Karl Pearson both understood the role data variance played in statistics. Sir Ronald Fisher in conducting
experimental designs in the field of agriculture knew that mean differences would be a fair test if the
experimental and control groups had approximately equal variances. Karl Pearson in developing his correlation
coefficient when studying heredity variables employed bivariate covariance with each variable variance to
compute a measure of association. The amount of covariance indicated whether two variables were associated.
In both cases, the amount of variance indicated individual differences. For example, if a dependent variable,
plant growth, did not vary, then no individual difference existed. If the height of males and females do not
covary, then there is no association.

It is a basic fact that we are interested in studying why variation occurs. For example, if test scores were all the
same, hence the standard deviation or variance is zero, then we know that all students had the same test score
—no variance; that is, no student difference. However, when test scores do vary, we wish to investigate why
the test scores varied. We might investigate gender differences in mean test scores to discover that boys on
average scored higher than girls. We might correlate hours spent studying with test scores to determine if test
scores were higher given that a student spent more time studying—a relationship exists.

We should also understand situations, when studying variance, where the use of inferential statistics is not
appropriate. For example,

Sample size is small (n < 30)


N = 1 (astronomer studies only one planet)
Nonrandom sampling (convenience, systematic, cluster, nonprobability)
Guessing is just as good (gambling)
Entire population is measured (census)
Exact probabilities are known (finite vs. infinite population size)
Qualitative data (nonnumeric)
Law (no need to estimate or predict)
No inference being made from sample statistic to population parameter (descriptive)

When using statistics, certain assumptions should be met to provide for a fair test of mean differences or
correlation. When the statistical assumptions are not met, we consider the statistical results to be biased or
inaccurate. There are several factors that can affect the computation and interpretation of an inferential
statistic (Schumacker & Tomek, 2013). Some of them are listed here:

Restriction of range
Missing data
Outliers
Nonnormality

28
Nonlinearity
Equal variance
Equal covariance
Suppressor variables
Correction for attenuation
Nonpositive definite matrices
Sample size, power, effect size

A few heuristic data sets in Table 1.1 show the effect certain factors have on the Pearson correlation
coefficient. The complete data set indicates that Pearson r = .782, p = .007, which would be used to make an
inference about the population parameter, rho.

However, if missing data are present, Pearson r = .659, p = .108, a nonsignificant finding, so no inference
would be made. More important, if listwise deletion was used, more subject data might not be used, or if
pairwise deletion was used, then different sample sizes would be used for each bivariate correlation. We
generally desire neither of these choices when conducting statistical tests. The nature of an outlier (extreme
data value) can also cause inaccurate results. For data set A (Y = 27 outlier), Pearson r = .524, p = .37, a
nonsignificant finding, whereas for data set B with no outlier, Pearson r = − .994, p = .001. These data have
two very different outcomes based on a single outlier data value. The range of data also can affect correlation,
sometimes referred to as restriction of range (thus limiting variability). In the data set, Y ranges from 3 to 7
and X ranges from 1 to 4, with Pearson r = 0.0, p = 1.0. These values could easily have been taken from a
Likert scale on a questionnaire. A small sampling effect combined with restriction of range compounds the
effect but produces Pearson r = −1.00, p = 0.0. Again, these are two very different results. Finally, a nonlinear
data relation produces Pearson r = 0.0, which we are taught in our basic statistics course, because the Pearson
correlation measures linear bivariate variable associations. These outcomes are very different and dramatically
affect our statistical calculations and interpretations (Schumacker, 2014).

29
The different multivariate statistics presented in the book will address one or more of these issues. R functions
will be used to assess or test whether the assumptions are met. Each chapter provides the basic R commands
to perform a test of any assumptions and the multivariate statistics discussed in the chapter.

30
R Software
R is free software that contains a library of packages with many different functions. R can run on Windows,
Mac OS X, or UNIX computer operating systems, which makes it ideal for students today to use with PC and
Apple laptops. The R software can be downloaded from the Comprehensive R Archive Network (CRAN),
which is located at the following URL:

Once R is downloaded and installed, you can obtain additional R manuals, references, and materials by issuing
the following command in the RGUI (graphical user interface) window:

To obtain information about the R stats package, issue the following command in the RGui Console window:

This will provide a list of the functions in the stats package. An index of the statistical functions available in
the stats package will appear in a separate dialog box. The various functions are listed from A to Z with a
description of each. You will become more familiar with selecting a package and using certain functions as you
navigate through the various statistical methods presented in the book. A comprehensive Introduction to R is
available online at the following URL:

It covers the basics (reading data files, writing functions), statistical models, graphical procedures, and
packages.

R is a syntax-based command language as opposed to a point and click activation. A comparison could be
made between SAS (statistical analysis software; syntax commands) and SPSS, an IBM Company (statistical
package for the social sciences; point and click). The point and click activation is often referred to as a GUI.
Many software products are going with a mouse point and click activation to make it user friendly. However,
although the point and click makes it easy to execute commands (functions), the results of what was selected
in the dialog boxes is lost after exiting the software. I instruct my students, for example, when using SPSS, to
always use the paste function and save the syntax. They can then recall what the point and click sequences
were that obtained the statistical results.

R uses simple syntax commands and functions to achieve results, which can be saved in a file and used at a
later date. This also permits adding additional commands or functions to a statistical analysis as needed. The
R commands can be contained between brackets, which identifies a function, or issued separately. The
appendix contains information for the installation and usage of R, as well as a reference guide of the various R
packages, functions, data sets, and script files used in the chapters of the book.

Using the R software has also been made easy by two additional free R software products that use GUI
windows to navigate file locations and operations. The software products are installed after you have installed
R. The two software products are Rcommander and RStudio. You can download and install these software
products at the following websites:

31
Rcommander (Rcmdr) enables easy access to a selection of commonly used R commands with an output
window directly below the command line window. It provides a main menu with options for editing data,
statistics, graphs, models, and distribution types. A menu tree of the options is listed on the developer’s
website:

RStudio software provides an easy menu to create and store projects. The RStudio GUI window is partitioned
into four parts. The first subwindow contains the data set, the second the console window with the R
commands, the third a list of data files and commands being used in the workspace, and the fourth a menu to
select files, plots, and packages or to seek help. It permits an easy way to locate and import packages that
would be used to compute your statistic or plot the data. A nice feature of RStudio is that it will prompt you
when software updates are available and activate the Internet download window for installation. In addition,
RStudio personnel provide training workshops.

32
Web Resources
R is a popular alternative to commercially available software packages, which can be expensive for the end
user. Given R popularity, several websites have been developed and supported, which provide easy access to
information and how-to-do features with R. Quick-R is easy to use, informative, and located at the following
URL:

The website provides tutorials, a listing of books, and a menu that encompasses data input, data management,
basic statistics, advanced statistics, basic graphs, and advanced graphs. The R code listed in the many
examples can be easily copied, modified, and incorporated into your own R program file.

There are many R tutorials available by simply entering R tutorials in the search window of a browser. Some
tutorials are free, while others require membership (e.g., www.lynda.com). There is a blog website that
provides a fairly comprehensive list of R video tutorials at the following URL:

33
References
Schumacker, R. E. (2014). Learning statistics using R. Thousand Oaks, CA: Sage.

Schumacker, R. E., & Tomek, S. (2013). Understanding statistics using R. New York, NY: Springer-Verlag.

34
2 Multivariate Statistics Issues and Assumptions

Issues
Assumptions
Normality
Determinant of a Matrix
Equality of Variance–Covariance Matrices
Box M Test
SPSS Check
Summary
Web Resources
References

35
Courtesy of Samuel Shapiro

Samuel Sanford Shapiro (1930 to present) was born in New York City. He was an American statistician and engineer and a statistics
graduate of City College in New York (now City University) in 1952. Samuel received an MS in industrial engineering at Columbia
University in 1954. He worked as a statistician in the Army Chemical Corps before he joined the General Electric Corporation. He
obtained his MS degree in 1960 and PhD in 1963 in statistics at Rutgers University. He was coauthor of the 1965 paper that
introduced the Shapiro–Wilk test and the 1972 paper introducing the Shapiro–Francia test. In 1972, he joined the faculty at Florida
International University.

36
Genest, Christian; Brackstone, Gordon. A Conversation with Martin Bradbury Wilk. Statist. Sci. 25 (2010), no. 2, 258-
-273. doi:10.1214/08-STS272. http://projecteuclid.org/euclid.ss/1290175846

Martin Bradbury Wilk (December 18, 1922, to February 19, 2013) was a Canadian statistician and academic. In 1965, together with
Samuel Shapiro, he developed the Shapiro–Wilk test. He was born in Montreal, Quebec, and received a bachelor of engineering
degree in chemical engineering from McGill University, Canada, in 1945. From 1945 to 1950, he was a Research Chemical
Engineer on the Atomic Energy Project at the National Research Council of Canada. From 1951 to 1955, he worked as a Research
Associate, Instructor, and Assistant Professor at Iowa State University, where he received a master of science in statistics in 1953 and
a PhD in statistics in 1955. From 1955 to 1957, he worked as the Assistant Director of the Statistical Techniques Research Group
at Princeton University. From 1959 to 1963, he was a Professor and Director of Research in Statistics at Rutgers University. In
1970, he joined AT&T, and from 1976 to 1980 he was the Assistant Vice President, Director of Corporate Planning. From 1980 to
1985, he was the Chief Statistician of Canada. In 1999, he was made an Officer of the Order of Canada for his insightful guidance
on important matters related to the country’s national statistical system.

37
DavidMCEddy at en.wikipedia Licensed under the Creative Commons Attribution-Share Alike 3.0 Unported

George E. P. “Pel” Box died on March 28, 2013, at the age of 93. George was born in Gravesend (Kent, England) in 1919. Among
other contributions to the field of statistics, he was known for the Box M test. The Box M test was used to test for the equality of
variance–covariance matrices in multivariate statistics. He began his scientific life as a chemist, publishing his first paper at the age of
19 on the activated sludge process to produce clean effluent. During his 6 years in the army, he eventually was sent to Porton Down
Experimental Station to study the potential impact of poison gases. He realized that only a statistician could get reliable results from
experiments, so he taught himself statistics, and a career was born. He worked at North Carolina State in 1953, where he met some
of the preeminent statisticians of the day. In 1956, he went to Princeton University to direct a statistical research group. George
came to Madison in 1959 and established the University of Wisconsin’s Department of Statistics in 1960, and he retired as an
emeritus professor in 1992. George cofounded the University of Wisconsin Center for Quality and Productivity Improvement with
William “Bill” Hunter in 1985. He wrote and coauthored major statistical books on evolutionary operation, time series, Bayesian
analysis, the design of experiments, statistical control, and quality improvement. His last book, a memoir called An Accidental
Statistician: The Life and Memories of G. E. P. Box, was published by Wiley in 2013.

38
39
Issues
There are several issues that can affect the outcome of a multivariate statistical analysis. Multivariate statistics
differs from univariate statistics in that more than one dependent variable is specified. Therefore, the number
of dependent variables may affect the results. It has been suggested that five dependent variables are the most
one should use (Stevens, 2009). If the dependent variables are not correlated, then each would add a unique
explained variance to the results. When the dependent variables are highly correlated, results would be severely
affected. The higher the level of correlation among dependent variables, the more negative an effect it has in
the multivariable analysis. So the number of dependent variables and their intercorrelation will affect
multivariate statistical analyses.

Multicollinearity, or intercorrelation among the independent variables, will also affect the results. If the
independent variables have a high level of correlation among themselves, they would explain more variation
among themselves and less explained variance in the dependent variables. The analysis and interpretation
would therefore be severely affected.

In some cases, a nonpositive definite matrix arises, which indicates correlation values that are out of bounds,
that is, greater than 1.0. Sometimes, you may also find an error message indicating Heywood cases. This
occurs when a variable has a negative variance, which is not permitted in statistical analysis (recall variance is
always positive, while covariance terms can be positive or negative). The sphericity assumption requires that
the variance of the differences in pairs of repeated measures be equal across time periods. Compound
symmetry, which is a more stringent condition for conducting multivariate repeated measures, requires that
the population variances and covariances are equal across time periods. The issues listed here will be further
discussed as they relate to topics in the relevant chapters of the book for the respective multivariate statistical
methods.

Number of dependent variables


Correlation among dependent variables
Multicollinearity: Independent variable correlation
Positive definite matrix
Heywood cases
Sphericity
Compound symmetry

40
Assumptions
There are a few important assumptions that, if not met, can affect the multivariate statistical analysis. They
are normality, determinant of a matrix, and equality of the variance–covariance matrix. Normality should be
checked for each individual variable, as well as for the overall multivariate normality. Generally, all individual
variables do not have to display normality in multivariate statistics (Stevens, 2009). In some cases, a data
transformation can produce a normally distributed variable.

41
Normality
You can find R functions for the different types of normality tests using the following R command:

This returns a list of different multivariate normality tests. I chose the Shapiro–Wilk test (Shapiro & Wilk,
1965) because it is the most powerful in detecting departures from normality (Razali & Wah, 2011; Stevens,
2009, p. 223). For example, the R package mvnormtest is listed, which has a function mshapiro.test(), for the
Shapiro–Wilk multivariate normality test. The argument that needs to be supplied in the function is the U
value, which represents a transposed numeric matrix of data values ranging from 3 to 5,000. There are a few
simple steps required before running the R function:

You can alternatively select Packages and Install packages() from the main menu. A few other simple
commands are helpful to check your library of packages and the list of data sets available in R.

Next, we need to transpose the data file so that columns are rows before conducting the Shapiro–Wilk
multivariate normality test. The t() function is used to transpose the data file. We are now ready to run the
mshapiro.test() function in the mvnormtest package on the attitude data set.

The Shapiro–Wilk p value of .0002 indicates that the multivariate normality assumption does not hold. It
indicates that one or more individual variables are not normally distributed.

Another R package for conducting normality tests, Jarque–Bera test (Jarque & Bera, 1987) for observations
and regression residuals, is located in the R package, normtest. The set of commands are as follows:

42
The Jarque–Bera test for normality agreed with the Shapiro–Wilk test that one or more variables are not
normally distributed.

We can also check the individual variable skewness, kurtosis, and normality for each of the variables in the
attitude data set. You can use the R package, normwhn.test. The following R commands were used.

This normality test includes individual variable skewness and kurtosis values and an omnibus test of normality.
The skewness and kurtosis results are shown in Table 2.1. The results indicated that the variables overall did
not contain skewness or kurtosis. The omnibus normality test indicated that the data were normally
distributed, Z = 12.84, df = 14, p = .54.

There are R packages that would indicate each individual variable’s normality. The R package nortest contains
five normality tests: (1) the ad.test (Anderson–Darling), (2) cvm.test (Cramer–von Mises), (3) lillie.test
(Kolmogorov–Smirnov), (4) pearson.test (Pearson chi-square), and (5) sf.test (Shapiro–Francia). Thode
(2002) discussed the pros and cons of each normality test. The Anderson–Darling test is recommended by M.
A. Stephens (1986). You would install the package, load the package, and run each function with each of the
variables in the data set. The initial set of R commands were as follows:

The five normality tests on each variable in the data set can be run with these sets of commands:

43
Note: The five normality tests were repeated for each of the variables in the data set.

The results are compared in Table 2.2. All five normality tests showed that the variable critical violated the
normality assumption. This is why the Shapiro–Wilk and Jarque–Bera tests indicated that the multivariate
normality assumption was not met. However, this single variable, critical, although not normally distributed,
would generally not affect the power to detect a difference in a multivariate statistical analysis (Stevens, 2009).

44
Determinant of a Matrix
Matrices are designated by capital letters, for example, M. A determinant of matrix M is denoted by |M| and
yields a unique number for the matrix. There are two important reasons why the determinant value of a matrix
is important in multivariate statistics. First, the determinant of a variance–covariance matrix represents the
generalized variance among the several variables. The determinant of a matrix is defined as how much variance
is contained in the matrix of variables taking into account the covariances among variables, which indicate
how much overlap there is in the variances among pairs of variables. This should not be confused with the
trace of a matrix. The trace is a single value that represents the sum of the individual variable variances in the
diagonal of a variance–covariance matrix. The trace of a covariance matrix is the total variance of the data in
the matrix. The determinant (generalized variance) takes into account the redundancy implied by the
covariance in the matrix, while the trace (total variance) does not. Second, the determinant is used in the
calculation of the multivariate test statistic. For example, Wilks’s L represents the ratio of two determinants in
multiple regression. In multivariate analysis of variance, Wilks’s L indicates whether several groups differ on
the set of variables in the matrix (L = |W|/|T|), where W indicates sums of squares within and T indicates
sums of squares total).

There are formal mathematical operations to compute the determinant of a matrix, but these are not covered
here because it can be directly computed from an R function. Second, almost every multivariate statistical text
has a discussion and computation of the determinant of a matrix (Stevens, 2009). Also, many statistical
packages now show the determinant value for the data matrix in the computer output. When the determinant
of a matrix is zero or negative, generally a warning message will appear and the program will stop running (I
think you will now be looking for it!).

The det() function in R will return the determinant of a square matrix. A square matrix represents an equal
number of rows and columns, so no lower triangular matrices are permitted. Square matrices are required for
the matrix addition, subtraction, multiplication, and division operations. The data set attitude in the stats
package will be used to compute the determinant of a matrix. We will need to first convert the data set
attitude to a square matrix—either a correlation matrix or a variance–covariance matrix. The R commands for
correlation matrix are as follows:

Next, the determinant is computed using the R function det().

Note: The default correlation type in cor() is the Pearson correlation coefficient.

The R commands to create a variance–covariance matrix and to obtain the determinant of the matrix are as
follows:

45
The determinant (generalized variance) of the matrix is positive, therefore multivariate statistical analysis can
proceed.

Note: If you want decimal places, rather than scientific notation, issue the following command:

An additional R function can be useful when wanting a correlation matrix from a variance–covariance matrix.
The function cov2cor() scales a covariance matrix by its diagonal to compute the correlation matrix. The
command is as follows:

You will get this same correlation matrix if you list the matrix mycor, which was previously created from the
attitude data set.

46
Equality of Variance–Covariance Matrix
The equality of variance–covariance matrices is an important assumption in multivariate statistics. In
univariate statistics, we check the assumption of equal group variance before conducting an independent t test
or ANOVA (analysis of variance) statistical test of group mean differences (Bartlett or Levene’s test). In
multivariate statistics, we check for the equality of the variance–covariance matrix using the Box M test.

The attitude data set is now modified with the following R commands to include a grouping variable. This
will permit creation of separate group variance–covariance matrices and the calculation of the determinants of
the matrix for each group required for the Box M test. The R commands to add a group membership variable
are as follows:

The within variance–covariance matrices and associated determinants for each group can now be calculated
with the following R commands.

The determinants of the boys and girls variance–covariance matrices were positive, thus multivariate statistical
analysis can proceed.

We can obtain the descriptive statistics by using the R package psych and the describeBy() function. We install
and load the psych package, and then we issue the command for the function.

47
Only the output for the means and standard deviations of the variables are reported in Table 2.3. The
standard deviations, hence variance values, do look a little different between the boys and the girls across the
variables.

We can create and list the separate covariance matrices for each group in the newdata data set by using the
lapply() function, where the group variable is deleted (−8) as follows:

We can list the boys’ covariance matrix as follows:

We can list the girls’ covariance matrix as follows:

The first approach obtained the separate group variance–covariance matrices much easily (less sophisticated
programming). We can easily convert these separate variance–covariance matrices into correlation matrices
using the cov() function and the cov2cor() function as shown before. The covariance matrices exclude the
grouping variable by indicating a −8 value (column location in the matrix) in the selection of variables to
include. The two sets of R commands are as follows:

48
and

The separate correlation matrices for the groups are listed below.

49
Box M Test
Box (1949) developed the theory and application related to testing equal variance–covariance matrices
between groups in the multivariate case, which was an extension of the Bartlett test (Bartlett, 1937) used in
the univariate homogeneity of variance test. The Box M test uses the determinants of the within-covariance
matrices of each group—that is the generalized variances of each group. The Box M test is sensitive to
violations of multivariate nonnormality, so that should be checked first before checking the equality of group
variance–covariance matrices. Simply stated, the Box M test may be rejected due to a lack of multivariate
normality rather than the covariance matrices being unequal. The Shapiro–Wilk test indicated that the
multivariate normality assumption did not hold, that is, one or more individual variables were not normally
distributed. On further inspection, the variable critical was the only one not normally distributed. In this
situation, we can proceed by either using a data transformation technique or continuing with the analysis,
given it has little impact on the power to detect a difference (Stevens, 2009, pp. 222–224).

The biotools package has a boxM() function for testing the equality of covariance matrices between groups.
The package can be installed from the main menu. Their example omitted the specifying of a group variable
as a factor. The basic example shown here however declares the grouping variable as a factor, therefore the
complete code would be as follows:

Note: The use of iris[,-5] selects variables and iris[,5] indicates grouping variable in data file.

In the iris data set, the null hypothesis of equal variance–covariance matrices was rejected. The groups did
have different variance–covariance matrices. A multivariate analysis would therefore be suspect, especially if
nonnormality existed among the variables.

The R code steps are listed below. First, install the package from the main menu. Next, load the biotools
package and the data set. You must declare the variable group as a factor before running the Box M test
function. The set of R commands are as follows:

50
The results indicated that the boys and the girls variance–covariance matrices are equal; that is, the chi-square
test was nonsignificant.

51
SPSS Check
Given so many packages and functions available in R, it is important to check the accuracy of various R
functions that individuals have contributed to the R software library. You can do this by running the data in
SPSS or SAS procedures. For the newdata data set, a two-group MANOVA in SPSS (v. 20) was run to
obtain the Box M test results. SPSS returned the following values: Box M = 46.251 (F = 1.2, df = 28, 2,731, p
= .216). The Box M test result in the biotools package was similar to that from SPSS. The null hypothesis was
retained: The observed variance–covariance matrices of the dependent variables for the two groups were equal.

52
Summary
In this chapter, a few key issues that affect multivariate statistics were listed, namely, the number of dependent
variables, the correlation among the dependent variables, the correlation among the independent variables
(multicollinearity), nonpositive definite matrix, Heywood case, and sphericity. It makes sense that when
dependent variables are highly intercorrelated, they would explain a single dimension or construct. Similarly, if
independent variables are highly intercorrelated, they detract from explaining dependent variable variance and
more likely identify a single dimension or construct. The ideal analysis would be when dependent and
independent variables are not correlated (orthogonal), which seldom exists, so it is a matter of importance to
check the severity of these issues and their impact on the analysis. For example, if a determinant of a matrix is
zero or negative, a nonpositive matrix exists, and therefore parameter estimates can’t be computed. Similarly,
in some multivariate analyses, a Heywood case may appear, thus variable(s) with negative variance are not
permitted in statistical formulas. Finally, in repeated measures designs, matrix sphericity would negate the
assumption of equal variance–covariance across time periods, thus making parameter comparisons biased.

Three basic assumptions were also covered in the chapter because of their impact on the statistical results.
Normality, both at the univariate and multivariate variable levels should be checked. Multivariate statistics are
generally robust to violations of normality; however, data transformations can help if a few variables are
severely skewed. I provided an example where five different normality tests were compared, with similar
results. The determinant of a matrix is of great importance since it indicates the generalized variance of a set
of variables in a matrix. If the determinant of a matrix is zero or negative, then statistical computations are not
possible in matrix operations. Most statistical packages will routinely indicate the determinant of a matrix
before proceeding with estimation of statistical parameters. Finally, the assumption of equal variance between
groups in the univariate case is also important in the multivariate case. The test of equal variance–covariance
matrices is a more stringent test than in the univariate case because of multiple variable relations. When these
assumptions are not met, other options discussed later in the book are used.

This chapter covered the basic issues and assumptions that impact the calculations of parameter estimates in
multivariate statistics. The key issues were as follows:

Number of dependent variables and correlation among the dependent variables


Multicollinearity among the independent variables
Presence of nonpositive definite matrix, Heywood cases, or sphericity

The important assumptions discussed were as follows:

Multivariate normal distribution


Determinant of a matrix
Equality of variance–covariance matrices

53
54
Web Resources
Package nortest (July, 2014)

http://cran.r-project.org/web/packages/nortest/nortest.pdf

Package normtest (March, 2014)

https://dspace.spbu.ru/bitstream/123456789/1021/1/normtest%20manual.pdf

55
References
Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Society of
London Series A, 160, 268–282.

Box, G. E. P. (1949). A general distribution theory for a class of likelihood criteria. Biometrika, 36, 317–346.

Jarque, C. M., & Bera, A. K. (1987). A test for normality of observations and regression residuals.
International Statistical Review, 55, 163–172.

Razali, N., & Wah, Y. B. (2011). Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and
Anderson–Darling tests. Journal of Statistical Modeling and Analytics, 2(1), 21–33.

Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples).
Biometrika, 52(3–4), 591–611.

Stephens, M. A. (1986). Tests based on EDF statistics. In R. B. D’Agostino & M. A. Stephens (Eds.),
Goodness-of-fit techniques. New York, NY: Marcel Dekker.

Stevens, S. S. (2009). Applied multivariate statistics for the social sciences (5th ed.). New York, NY:
Routledge (Taylor & Francis Group).

Thode, H. C., Jr. (2002). Testing for normality. New York, NY: Marcel Dekker.

56
3 Hotelling’s T2A Two-Group Multivariate Analysis

57
Chapter Contents
Overview
Assumptions
Univariate Versus Multivariate Hypothesis
Statistical Significance
Practical Examples Using R
Single Sample
Two Independent Group Mean Difference
Two Groups (Paired) Dependent Variable Mean Difference
Power and Effect Size
A Priori Power Estimation
Effect Size Measures
Reporting and Interpreting
Summary
Exercises
Web Resources
References

58
Courtesy of the State Library of North Carolina

The Hotelling’s T2 was developed by Harold Hotelling (1895–1973) to extend the univariate t test with one dependent variable to a
multivariate t test with two or more dependent variables (Hotelling, 1931). He attributes his interest in statistics to his professional
relations with R. A. Fisher. He was an associate professor of mathematics at Stanford University from 1927 to 1931. He was a
member of the Columbia University faculty from 1931 to 1946. While at Columbia University, he sponsored Henry Mann
(nonparametric Mann–Whitney U statistic) and Abraham Wald (decision theory, statistical sequential analysis) due to European
anti-Semitism. Hotelling is well-known for his vision that universities should have a department of statistics. He spent much of his
career as a professor of mathematical statistics at the University of North Carolina at Chapel Hill from 1946 until his death in 1973.

59
Overview
So why use more than one dependent variable? There are two main reasons. First, any treatment will affect
participants in more than one way, so multiple measures on several criteria provide a more valid assessment of
group differences (experimental vs. control). Second, the use of more criteria measures permits a better profile
of group differences.

We can also examine the following question: “Why use a multivariate analysis rather than a univariate
analysis?” There are several reasons from a statistical point of view.

First, the Type I error rate is inflated when using several univariate tests; for example, two univariate t tests
would have a Type I error rate of (.95)(.95) = .90, so 1 − .90 = .10 (probability of falsely rejecting the null
hypothesis; a Type I error rate), not the individual Type I error rate of .05. A researcher could test each
univariate t test at the .025 level to avoid an inflated Type I error rate. This has been referred to as the Dunn–
Bonferroni adjustment to the alpha level, where the alpha level is divided by the number of tests; for example,
.05 divided by 2 = .025. The multivariate test could incorporate both the tests and keep the alpha level at the
.05 level, thus maintaining the power for the test of group mean differences. The second reason is that the
univariate test ignores covariance (correlation) among dependent variables. The separate univariate t tests
would not include the relation among the dependent variables. Another good reason to conduct multivariate
analyses is when a set of dependent variables have a theoretical basis or rationale for being together. The third
reason is that a researcher may not find a single univariate mean difference between groups, but jointly, a
mean difference may exist when considering the set of dependent variables. These three reasons for
conducting a multivariate analysis provide a sound rationale to consider when analyzing data with multiple
dependent variables.

Stevens (2009) pointed out that a researcher may not find a multivariate joint group mean difference for all
dependent variables, so a researcher should check for subsets of dependent variables, which might be
statistically significant. This situation may arise when a researcher uses subtest scores for the set of dependent
variables, rather than using a total test score. Basically, one or more subtest mean differences may exist
between the two groups, but the total test score mean is not statistically different. Similarly, two dependent
variables might indicate multivariate statistical significance, but a third variable when included may suppress
or negate the statistical significance of the other two variables.

60
Assumptions
When conducting the Hotelling T2 statistic, it is important to consider the data assumptions that affect the
statistical test. Four data assumptions are important to consider when computing the Hotelling T2 test of
group mean differences:

1. The data from population i are sampled from a population with mean vector μi.
This assumption implies that there are no subpopulations with different population means. A
randomized experiment with subjects randomly assigned to experimental and control groups
would meet this assumption.
2. The data from both populations have a common variance–covariance matrix—∑.

We can test the null hypothesis that ∑1 is equal to ∑2 against the general alternative that they are
not equal using a Box M test:
H0:Σ1=Σ2HA:Σ1≠Σ2

Under the null hypothesis, H0: ∑1 = ∑2, Bartlett’s test statistic is approximately chi-square
distributed with P(P + 1)/2 degrees of freedom; P = number of variables. If the Bartlett’s test is
statistically significant, then we reject the null hypothesis and assume that the variance–covariance
matrices are different between the two groups.
3. The data values are independent.
The subjects from both populations are independently sampled.
Subjects from the two separate populations were independently randomly sampled. This does not
mean that the variables are independent of one another.
The independence assumption is violated when using nonprobability, clustered, time series, and
spatial sampled data. If data are dependent, then the results for some observations are going to be
predictable from the results of other observations (linear dependency). The consequence of
violating the assumption of independence is that the null hypothesis is rejected more often than if
the independence assumption is met, and linear dependency results in a nonpositive definite
matrix.
4. Both populations of data are multivariate normally distributed.
We can check this using the following approaches:
Produce histograms for each variable to check for a symmetric distribution.
Produce scatter plots of variables to check for an elliptical display of points.
Run a Shapiro–Wilk test of multivariate normality.

Notes:

The central limit theorem states that the dependent variable sample means are going to be approximately

61
multivariate normally distributed regardless of the distribution of the original variables.
Hotelling’s T2 test is robust to violations of assumptions of multivariate normality; however, the Box M
test should not be used if data are not multivariate normally distributed.
Hotelling’s T2 test is sensitive to violations of the assumption of equal variance–covariance matrices,
especially when sample sizes are unequal, that is, n1 ≠ n2. If the sample sizes are equal, the Hotelling’s
T2 test is more robust.

62
Univariate Versus Multivariate Hypothesis
The expression of the univariate and multivariate hypotheses shows the extension of the univariate t test with
a single dependent variable to the multivariate t-test case with multiple dependent variables. Instead of a
single comparison of means between two groups, we express multiple dependent variable means for each
group in a matrix vector. The univariate null hypothesis is expressed as follows:
H0:μ1=μ2,

and the univariate t test is computed as follows:


t=y¯1−y¯2(n1−1)s12+(n2−1)s22n1+n2−2(1n1+1n2).

When the denominator of the formula is expressed as a pooled estimate of the common population variance
for the two groups, squaring both sides reduces the formula to
t2=(y¯1−y¯2)2spooled2(1n1+1n2),

which can be expressed as follows:


t2=n1n2n1+n2(y¯1−y¯2)(spooled2)−1(y¯1−y¯2).

The multivariate null hypothesis with P dependent variables is expressed in a matrix vector as follows:
H0(μ11μ21⋮μP1)=(μ12μ22⋮μP2),

and the Hotelling T2 multivariate t test that replaces each variable with a vector of means ( Y¯1and Y¯2

) for each group is computed as follows:


T2=n1n2n1+n2(Y¯1−Y¯2)S−1(Y¯1−Y¯2).

Note: S−1 is an estimate of the common population covariance matrix of dependent variables for both groups,
and capital Y letters are used to denote the matrix vectors of means.

63
We see from the univariate t-test formula that the two sample means for each group are replaced in the
multivariate t test with a vector of means based on the number of dependent variables. Similarly, the common
population covariance matrix in the univariate t test is expanded to include more than one dependent variable
in the multivariate t test. The univariate and multivariate t-test formulas should look similar except for the
inclusion of the matrix vector notation.

64
Statistical Significance
The univariate t test has a table of critical t-test values with varying degrees of freedom for checking statistical
significance, while the Hotelling T2 multivariate t test does not. However, statistical significance for both the
univariate and multivariate t test can be tested using an F test.

The Hotelling T2 statistic uses the sample size of each group, a vector of mean differences between groups,
and the pooled sample estimate of the population variance–covariance matrix of the dependent variables. An
assumption that the groups have equal variance–covariance matrix is required before testing for mean
differences, which is generally computed as the Box’s M test. The test of equal variance–covariance matrices
between groups is an extension of the assumption in the univariate case, which is tested using the Levene’s test
of equal variance between two or more groups.

The Hotelling T2 statistic is tested for significance using the F test. The F-test formula uses the sample sizes
of each group, the number of dependent variables (P), and of course the T2 value. The critical F value with
numerator and denominator degrees of freedom (df) for α = .05, .01, and .001 can be found in statistical tables
for F values; however, software today reports the F test of statistical significance. Given the degrees of
freedom as follows:
df1=Pdf2=n1+n2−p−1

The F value is computed as follows:


F=(df1df2)T2.

65
Practical Examples Using R
The multivariate t test(s) parallel the three types of group mean difference tests computed in the univariate
case: (1) single sample, (2) independent sample, and (3) dependent (paired) sample (Hotelling T2 R tutorial at
http://www.uni-kiel.de/psychologie/rexrepos/posts/multHotelling.html). You will need to have the R
software installed to conduct these mean difference tests, and, optionally, the Rcommander or RStudio software
(see Preface). Once the software is installed, the R script commands can be entered and run for each type of
group mean difference test.

66
Single Sample
The single-sample multivariate t test is computed when you have several dependent variables for a single
sample and hypothesize that the vector of means are statistically different from zero (null hypothesis).
Alternatively, the vector of dependent variable means could be tested for statistical significance from a
specified population mean. An educator might conduct a single-sample multivariate t test when obtaining
students’ test scores on two or more tests, for example, midterm and final exams in a class. Alternatively, a
teacher might test whether her students’ SAT and ACT scores were statistically different from the population
norms for the tests. The first step in conducting a multivariate single-sample t test is to install the R
package(s) and load the functions. The second step is to read in or create the sample data frame for the
number of dependent variables. A third step is to compute and print out the correlation between the
dependent variable(s) and compute the means and standard deviations of the dependent variables. A fourth
step could include a graph of the means for the dependent variables to visually show the magnitude of mean
difference. Finally, a Hotelling T2 test is computed. The Hotelling T.2() function reports the T2 value, which
is an F value, since F = T2. The results of each step are output after running the R code for each example.

The following single-sample multivariate t test has two dependent variables, Y1 and Y2. The first dependent
variable has scores that indicate the number of points subtracted from a pop quiz. The second dependent
variable has scores that indicate the number of points awarded on a homework assignment. The teacher wants
to test if the joint mean for these two dependent variables together are statistically significant for her 10
students. The R code for the necessary steps are highlighted, and the results are listed below each step.

R Code: Hotelling T2 Single Sample

67
68
The results for the single-sample multivariate t test indicated that the two dependent variable means together
are statistically significantly different from zero. The correlation matrix indicated that the two dependent
variables were correlated, r = − .587. The Hotelling T2 value was statistically significant: T.2 = 18.089 with 2
and 8 df, and p = .001. Therefore, the null hypothesis of no joint mean difference is rejected. The alternative
hypothesis is accepted, which reflects a test of whether the joint sample means are different from zero [true

69
location difference is not equal to c(0,0).

70
Two Independent Group Mean Difference
The two independent group multivariate t test is when you hypothesize that a set of dependent variable group
means are different between two independent groups, for example, Rogerian and Adlerian counselors. The R
code is highlighted for testing the null hypothesis of no mean difference, and the output is listed after the R
code. I have placed comments before sets of R command lines to provide a brief explanation of what each set
of commands are doing. There are three Rogerian counselors and six Adlerian counselors measured on two
dependent variables by their clients. The first measure was counseling effectiveness and the second measure
was counseling satisfaction based on a 10-point numerical scale.

R Code: Hotelling T2 (Two Independent Samples)

71
72
73
74
The results show that the two dependent variables were positively correlated, r = .829. The theoretical
meaningfulness and correlation of the two dependent variables provided the rationale for conducting the
multivariate t test. The first dependent variable had mean = 4 and standard deviation = 1.73, and the second
dependent variable had mean = 6.67 and standard deviation = 2.78. The Box M test indicated that the
covariance matrices were not statistically different, so we assumed them to be equal and proceeded with the
multivariate t test. The results indicated that T.2 = 9, with 2 and 6 df and p = .016 (Note: The function reports
T squared, which is equal to an F value—that is, T2 = (3)2 = 9. The null hypothesis of no group mean
difference is rejected. The alternative hypothesis is accepted—true location difference is not equal to c(0,0)—
which indicates that the two groups, Rogerian and Adlerian, had a statistically significant joint mean
difference for counseling effectiveness and counseling satisfaction by clients. A graph of the individual group
means for counseling effectiveness and counseling satisfaction shows that Adlerian counselors had higher
client means than Rogerian counselors on both dependent variables.

75
Tip:
When covariance matrices are not homogeneous, a Wald test would be computed. The R code is as follows:

76
Two Groups (Paired) Dependent Variable Mean Difference
The multivariate dependent t test is an extension of the univariate dependent t test with two or more
dependent variables. The data entry is important because you will need to calculate the mean difference
between the two groups on each dependent variable. The R code has been written to provide certain values
prior to the actual Hotelling T2 dependent t test. This includes printing out the difference scores, means, and
standard deviations. The R code is described at each step in a text box. The R code shows two different
approaches when conducting the multivariate dependent t test. The first approach is comparing the difference
scores between two groups. The two groups are fifth-grade boys and girls. The dependent variable was the
pop quiz test. The second approach is comparing all students on their difference scores. The pop quiz test was
given twice, once after instruction and again 2 weeks later. The teacher wanted to test memory retention of
the material taught in class. She hypothesized that students would not retain the information, and thus, they
would score lower on the second administration of the pop quiz. The teacher not only wanted to see if there
were differences between the boys and girls but also wanted to know if there was a difference overall for her
students, hence the two different multivariate dependent t-test approaches.

R Code: Hotelling T2 (Two Paired Dependent Variables)

77
Approach 1: Compare Boys and Girls Pop Quiz Difference Scores
In the first approach, we would first calculate the difference scores in each group. Then, we would calculate
the mean difference for each group. The R commands are as follows.

78
We would then want to graph the dependent variable mean differences to visually inspect the magnitude of
the mean difference. The R commands are as follows:

We can visually inspect the difference scores in each group with the following R command.

79
Finally, we compute the Hotelling T2 statistic separately on the difference scores for each group.

Approach 2: Compare All Students in Class on Pre and Post Scores


The Hotelling T2 test can be computed for omnibus difference scores for all subjects in the data set. We first
create the data set with the following R commands.

80
81
The first approach conducted a multivariate dependent t test to test whether the fifth-grade boys differed on
their Pop Quiz difference scores compared with the girls Pop Quiz difference scores. The boys had a 0.4
mean difference, while the girls had a 3.4 mean difference. For the boys, Hotelling T2 = 0.2857, df1 = 1, df2 =
4, and p value = .6213, so we would retain the null hypothesis of no difference in Pop Quiz scores. For the
girls, Hotelling T2 = 3.341, df1 = 1, df2 = 4, p value = .1416, so we would retain the null hypothesis of no
difference in Pop Quiz scores. The teacher was pleased that there was no statistical difference between the
boys’ and girls’ Pop Quiz scores.

The second approach conducted a multivariate dependent t test to test whether all fifth-grade students in her
class differed in their Pop Quiz scores. The data frame shows the Pre and Post scores for the dependent
variables side by side. This helps our understanding that the mean difference is what is being tested for
statistical significance. For example, the Pop Quiz mean was 10.1 the first time it was administered (Pre), and
the Pop Quiz mean was 12 the second time it was administered (Post). So the mean difference is 12 − 10.1 =
1.9. The Hotelling T2 = 3.1574, df1 = 1, df2 = 9, and p value = .1093, so we would retain the null hypothesis of
no difference in Pop Quiz scores for all students. The teacher gave the same Pop Quiz both Pre and Post, so
her interest was in whether students retained the information she taught. Therefore, the teacher was pleased
that the students did retain the information; thus, no difference on average between the first and second
administration of the Pop Quiz was a good finding. In contrast, researchers often design a study with a
pretest, followed by a treatment, and then a posttest. In this type of research design, the researcher would
expect a statistically significant difference if the treatment was effective and changed students’ scores.

82
83
Power and Effect Size
There are several factors that affect the power of a statistical test to detect a mean difference. The factors that
influence the power to detect a mean difference are as follows:

1. Type I error rate (alpha level)


2. Sample size
3. Effect size (difference in groups on the dependent variable)
4. Population standard deviation (homogeneous or heterogeneous)
5. Directionality of hypothesis (one-tail test vs. two-tail test)

When planning a research study, we would select values for these five criteria to compute power
(http://www.cedu.nniu.edu/~walker/calculators/). Alternatively, we could determine sample size by selecting
power and the other four criteria to compute the sample size needed for the study.

Their impact on power for each of these factors is briefly described as follows:

Type I error: Probability of rejecting the null hypothesis when it is true (hypothesize that groups differ
but really don’t)
Sample size: The larger the sample, the more representative of the population
Effect size: The smaller the difference wanting to detect, the larger scaled difference needed
Population standard deviation: Homogeneous (smaller sample size); heterogeneous (larger sample size)
Directionality of hypothesis: Test for mean difference in one direction has more power over testing for
mean differences in both tails.

We should also be concerned with Type II error rate, which is the counterintuitive testing of the Type I error
rate, which is defined as follows:

Type II error: Probability of accepting the null hypothesis when it is false (stated that groups don’t differ
but really do)

84
A Priori Power Estimation
A researcher can determine power when planning a study, which is an a priori determination, by selecting
values listed above. Power is a statement of how probable you want to be in detecting a mean difference, for
example, 80% probability of rejecting a null hypothesis when false. A popular free software, G*Power 3,
determines the a priori power for different statistical tests (http://www.psycho.uni-
duesseldorf.de/abteilungen/aap/gpower3/).

G*Power 3 has options for the Hotelling T2 one group. We would enter the following values to determine the
sample size: effect size (1.2), Type I alpha (.05), power (.80), and number of response variables (2). Sample
size was 10, which is the number of subjects in the single-sample multivariate t test. We could detect a mean
difference of 1.2 (effect size); our results indicated Y1 = −4.8 and Y2 = 4.4, which was greater than the
specified effect size (1.2).

Note: Criteria in the dialog boxes can be varied to achieve different results for effect size, power, and number
of response variables.

G*Power 3 also has options for the Hotelling T2 two independent group. We would enter the following
values to determine the sample size: effect size (3.6), Type I alpha (.05), power (.80), sample size ratio (n1 =
3/n2 = 6), and number of response variables (2). Effect size was selected to be 3.6 based on Y1 mean difference
of 3.0 between Rogerian (mean = 2) and Adlerian (mean = 5) counselors on counseling effectiveness, and Y2
mean difference of 4.0 between Rogerian (mean = 4) and Adlerian (mean = 8) counselors on counseling
satisfaction. The other criteria was selected to be Type I error rate or α = .05, power = .80, number of response
variables = 2, and ratio of sample sizes (3/6) = .5. Sample size was given as Group 1 = 5 and Group 2 = 3 for
power = .89, so we had sufficient sample size and power given our criteria.

85
G*Power also has other types of analysis options, which are shown in the pull-down menu. The dialog box
below, for example, computes power based on alpha, sample size, and effect size. I input the values for the
sample sizes of the two groups, number of response variables, and effect size, which yielded power = .805.

86
87
Effect Size Measures
The univariate effect size measures are generally given when reporting the general linear model results. These
popular univariate effect size measures (how many standard deviation units the group means are separated by)
are as follows:

1. Cohen’s d

Cohen’sd=(μ1−μ2)σ, where σ is the common population standard


deviation.
2. Partial eta-squared
ηP2=(df×F)(dfh×F+dfe).

Note: dfh is degrees of freedom for hypothesis, and dfe is degrees of freedom for error term. A partial eta-
squared = .01 (small), .06 (medium), and .14 (large) effect sizes.

The Mahalanobis D2 measure is commonly reported as a multivariate effect size measure. It uses the vector of
mean differences and the common population covariance matrix. The Mahalanobis D2 measure is calculated
as follows:

3. Mahalanobis D2 (two-group means)

D2=(μ1−μ2)′Σ−1(μ1−μ2),

where the multivariate vector of means is used with the variance–covariance matrix.
D^2=(Y¯1−Y¯2)′S−1(Y¯1−Y¯2).

The Mahalanobis D2 is a measure of the separation of the independent group means without using the sample
sizes of the groups (Hotelling T2 without sample size). It yields a value that indicates the distance in space
between the dependent variable means.

You can obtain the F and T2 values from the R code and then calculate the D2 effect size measure. The
calculations using the R output from the multivariate independent two-group results would be as follows:
F=(df1df2)T2=(62)3=9

T2=(df1df2)F=(26)9=3

88
D2=NT2n1n2=9(3)3(6)=1.5

The D2 effect size = 1.5 is considered a large effect size.

89
Reporting and Interpreting
A researcher should provide the descriptive statistics for the Hotelling T2 test of mean differences (means,
standard deviations, and correlations). In addition, the Box M test of equal covariance matrices should be
reported. This is followed by reporting the Hotelling T2, degrees of freedom, and p value. The power and
effect size information should also be given when possible. It is important to report these values along with
the hypothesis or research question. An examination of published journal articles in your field will guide what
information to report when conducting a Hotelling T2 analysis. A basic write-up is provided to help with that
understanding.

Rogerian and Adlerian counselors were compared on two dependent measures: counseling effectiveness and
counseling satisfaction. The means for Adlerian counselors were higher than Rogerian counselors on the two
dependent variables. A Hotelling T2 two independent group analysis was conducted which indicated a
statistically significant mean difference between the two groups (T2 = 3, df = 2, 6, p = .016) for the two
dependent variables. Adlerian counselors had higher mean scores on counseling effectiveness and counseling
satisfaction (5 and 8) than Rogerian counselors (2 and 4). The multivariate results indicated a significant
dependent variable joint effect. The multivariate effect size = 1.5 and power = .80.

90
Summary
This chapter presented a two-group multivariate test of mean differences on two or more dependent variables.
The Hotelling T2 test can be conducted on a single sample, mean difference between two independent
groups, or mean difference of a paired group. It is considered an extension of the univariate t-test method.
The assumptions and practical examples demonstrated how R functions can be used to test the mean
differences.

An important concept was also presented in the chapter, namely, power and effect size. The factors that affect
power were illustrated using G*Power 3 software. The software permits the determination of sample size
and/or power for the different multivariate tests. Additionally, the discussion of effect size measures relates the
importance of looking beyond statistical significance to the practical importance and meaningfulness of
interpretation given by an effect size measure. The relation and formula to convert F and T2 into a D2 effect
size is important, especially when the statistical output does not readily provide an effect size measure.

91
Exercises
1. Create two data vectors and merge them into one using R code.
2. Create a single membership vector for two groups.

3. Create an R code for data analysis in the Hotelling T2 two independent group example and show results.

92
Web Resources
Box’s M test

http://en.wikiversity.org/wiki/Box’s_M

Dunn–Bonferroni

http://en.wikipedia.org/wiki/Bonferroni_correction

G*Power 3

http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/

Hotelling Biography

http://en.wikipedia.org/wiki/Harold_Hotelling

Hotelling T2 R tutorial

http://www.uni-kiel.de/psychologie/rexrepos/posts/multHotelling.html

Levene’s test

http://en.wikipedia.org/wiki/Levene’s_test

Power and effect size

http://www.cedu.niu.edu/~walker/calculators/

93
References
Hotelling, H. (1931). The generalization of student’s ratio. Annals of Mathematical Statistics, 2(3), 360–378.

Schumacker, R. (2014). Learning statistics using R. Thousand Oaks, CA: Sage.

Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). New York, NY:
Routledge.

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston, MA: Allyn &
Bacon.

94
4 Multivariate Analysis of Variance

Assumptions
Independent Observations
Normality
Equal Variance–Covariance Matrices
Summary
MANOVA Example: One-Way Design
MANOVA Example: Factorial Design
Effect Size
Reporting and Interpreting
Summary
Exercises
Web Resources
References

Wilks’s Lambda was the first MANOVA test statistic developed and is very important for several multivariate
procedures in addition to MANOVA. The best known approximation for Wilks’s Lambda was derived by C.
R. Rao. The basic formula is as follows:
Wilks’s Lambda=Λ=|E||H|+|E|

The summary statistics, Pillai’s trace, Hotelling–Lawley’s trace, Wilks’s Lambda, and Roy’s largest root
(eigenvalue) in MANOVA are based on the eigenvalues of ⋀ = HE-1. These summary statistics are all the
same in the Hotelling T2 statistic. Consequently, MANOVA can be viewed as an extension of the
multivariate t-test similar to the analysis of variance (ANOVA) being an extension of the univariate t-test.

The ⋀ statistic is the ratio of the sum of squares for a hypothesized model and the sum of squares error. H is
the hypothesized model and cross products matrix and E is the error sum of squares and cross products
matrix. This is the major reason why statistical packages such as SPSS and SAS print out the eigenvalues and
eigenvectors of ⋀ = HE-1.

95
Copyright: Institute of Mathematical Statistics, Source: Archives of the Mathematisches Forschungsinstitut Oberwolfach
under the Creative Commons License Attribution-Share Alike 2.0 Germany

C. R. Rao (Calyampudi Radhakrishna Rao, born September 10, 1920 to present) is an Indian born naturalized American,
mathematician, and statistician. He holds an MA in both mathematics and statistics. He worked in India at the Indian Statistical
Institute (ISI) for 40 years and founded the Indian Econometric Society and the Indian Society for Medical Statistics. He worked at
the Museum of Anthropology and Archeology at Cambridge University, the United Kingdom, using statistical methodology
developed by P. C. Mahalanobis at ISI. He earned his PhD in 1948 from Cambridge University, with R. A. Fisher as his thesis
advisor. In 1965, the university awarded him the prestigious ScD degree based on a peer review of his research contributions to
statistics. He has received 31 honorary doctoral degrees from universities in 18 countries. After 40 years of working in India, he
moved to the United States and worked for another 25 years at the University of Pittsburgh and Pennsylvania State University,
where he served as Director of the Center for Multivariate Analysis. He is emeritus professor at Pennsylvania State University and
research professor at the University of Buffalo. Dr. Rao has received the distinguished R. A. Fisher Lectureship, Wilks Memorial
Award, and the National Medal of Science for Mathematics and Computer Science.

96
97
MANOVA Assumptions
The independence of observations is an assumption that is sometimes mentioned in a statistics book, but not
covered in-depth, although it is an important point when covering probability in statistics. MANOVA is
most useful when dependent variables have moderate correlation. If dependent variables are highly correlated,
it could be assumed that they are measuring the same variable or construct. This could also indicate a lack of
independence of observations. MANOVA also requires normally distributed variables, which we can test with
the Shapiro–Wilk test. MANOVA further requires equal variance–covariance matrices between groups to
assure a fair test of mean differences, which we can test with the Box M test. The three primary assumptions
in MANOVA are as follows:

1. Observations are independent


2. Observations are multivariate normally distributed on dependent variables for each group
3. Population covariance matrices for dependent variables are equal

98
Independent Observations
If individual observations are potentially dependent or related, as in the case of students in a classroom, then it
is recommended that an aggregate mean be used, the classroom mean (Stevens, 2009). The intraclass
correlation (ICC) can be used to test whether observations are independent. Shrout and Fleiss (1979)
provided six different ICC correlations. Their work expressed reliability and rater agreement under different
research designs. Their first ICC correlation provides the bases for determining if individuals are from the
same class—that is, no logical way of distinguishing them. If variables are logically distinguished, for example,
items on a test, then the Pearson r or Cronbach alpha coefficients are typically used. The first ICC formula
for single observations is computed as follows:
ICC=MSb−MSw[MSb+(n−1)MSY].

The R psych package contains the six ICC correlations developed by Shrout and Fleiss (2009). There is an R
package, ICC, that gives the MSb and MSw values in the formula, but it does not cover all of the six ICC
coefficients with p values.

Stevens (2009, p. 215) provides data on three teaching methods and two dependent variables (achievement 1
and achievement 2). The R commands to install the package and load the library with the data are as follows:

99
The first ICC (ICC1 = .99) indicates a high degree of intracorrelation or similarity of scores. The Pearson r =
.99, p < .0001, using the cor.test() function, so the correlation is statistically significant, and we may conclude
that the dependent variables are related and essentially are measuring the same thing.

100
We desire some dependent variable correlation to measure the joint effect of dependent variables (rationale for
conducting multivariate analysis); however, too much dependency affects our Type I error rate when
hypothesis testing for mean differences. Whether using the ICC or Pearson correlation, it is important to
check on this violation of independence because dependency among observations causes the alpha level (.05)
to be several times greater than expected. Recall, when Pearson r = 0, observations are considered independent
—that is, not linearly related.

101
Normality
MANOVA generally assumes that variables are normally distributed when conducting a multivariate test of
mean differences. It is best however to check both the univariate and the multivariate normality of variables.
As noted by Stevens (2009), not all variables have to be normally distributed to have a robust MANOVA F
test. Slight departures from skewness and kurtosis do not have a major impact on the level of significance and
power of the F test (Glass, Peckham, & Sanders, 1972). Data transformations are available to correct the
slight effects of skewness and kurtosis (Rummel, 1970). Popular data transformations are the log, arcsin, and
probit transformations depending on the nature of the data skewness and kurtosis.

The example uses the R nortest package, which contains five normality tests to check the univariate normality
of the dependent variables. The data frame, depvar, was created to capture only the dependent variables and
named the variables ach1 and ach2.

Next, you can run the five tests for both the ach1 and ach2 dependent variables.

The univariate normality results are shown in Table 4.1. The variable ach1 was indicated as being normally
distributed across all five normality tests. The variable ach2 was also indicated as being normally distributed
across all five normality tests.

The R mvnormtest package with the Shapiro–Wilk test can be used to check for multivariate normality. First,
install and load the package. Next, transpose the depvar data set that contained only the dependent variables
ach1 and ach2. Finally, use the transposed data set stevensT in the shapiro.test() function. The R commands
were as follows:

102
Results indicated that the two dependent variables are jointly distributed as multivariate normal (W = .949, p =
.07). We have therefore met the univariate and multivariate assumption of normally distributed dependent
variables.

103
Equal Variance–Covariance Matrices
The Box M test can be used to test the equality of the variance–covariance matrices across the three teaching
methods in the data set. We should first view the three variance–covariance matrices for each method. You
can use the following set of R commands to extract and print each set of data.

Next, create the variance–covariance matrix for each method along with the determinant of the matrices. The
following set of R commands were used.

104
The determinant of the variance–covariance matrices for all three methods have positive determinants greater
than zero, so parameter estimates can be obtained. We can now check for the assumption of equal variance–
covariance matrices between the three methods.

The biotools package has a boxM() function for testing the equality of covariance matrices between groups.
The package can be installed from the main menu or use the install.packages() function. The boxM()
function requires specifying a group variable as a factor. The R commands are as follows:

The Box M results indicate that the three methods have similar variance–covariance matrices (chi-square =
4.17, df = 6, p = .65).

105
106
Summary
Three key assumptions in MANOVA are independent observations, normality, and equal variance–covariance
matrices. These were calculated using R commands. The data set was from Stevens (2009, p. 215), and it
indicated three teaching methods and two dependent variables. The ICC and Pearson r correlations both
indicated a high degree of dependency between the two dependent variables (ICC1 = .99; r = .99). The
research design generally defines when the ICC versus the Pearson r is reported. A rationale for using
MANOVA is to test the joint effects of dependent variables, however, when the dependent variables are
highly correlated, it increases the Type I error rate. The univariate and multivariate normality assumptions for
the two dependent variables were met. In addition, the assumption of equal variance–covariance matrices
between the three methods was met. We will now proceed to run the MANOVA analysis using the data.

107
MANOVA Example: One-Way Design
A basic one-way MANOVA example is presented using the Stevens (2009, p. 215) data set that contains
three methods (group) and two dependent variables (achievement1 and achievement2). First, install and load a
few R packages for the MANOVA analysis, which permits use of Type III SS (R by default uses Type I SS),
and a package to provide descriptive statistics. The manova() function is given in the base stats package.

The MANOVA R commands to test for joint mean differences between the groups is as follows:

The four different summary statistics are shown in Table 4.2. Wilks ⋀ is the product of eigenvalues in WT-1.
The Hotelling–Lawley and Roy multivariate statistics are the product of eigenvalues in BW-1, which is an
extension of the univariate F statistic (F = MSb/MSw). The Pillai–Bartlett multivariate statistic is a product of
eigenvalues in BT-1. The matrices represent the multivariate expression for SS within (M), SS between (B),
and SS total (T). Olson (1976) reported that the power difference between the four types was generally small.
I prefer to report the Wilks or Hotelling–Lawley test statistic when the assumption of equal variance–
covariance among the groups is met. They tend to fall in-between the p value range of the other two
multivariate statistics. All four types of summary statistics indicated that the three groups (teaching methods)

108
had a joint dependent variable mean difference.

The summary.aov( ) function will yield the ANOVA univariate statistics for each of the dependent variables.
The dependent variable, V3 (ach1), indicated that the three groups differed in their achievement1 group
means (F = 11.68, p < .001). The dependent variable, V4 (ach2), indicated that the three groups differed in
their achievement2 group means (F = 11.08, p < .001).

To make a meaningful interpretation beyond the univariate and multivariate test statistics, a researcher would
calculate the group means and/or plot the results.

109
The descriptive statistics for the two dependent variables means by the three teaching methods shows the
differences in the groups. From a univariate ANOVA perspective, the dependent variable means are not the
same. However, our interest is in the joint mean difference between teaching methods. The first teaching
method had an average achievement of 24.335 (23.17 + 25.50/2). The second teaching method had an average
achievement of 65.75 (65.33 + 66.17/2). The third teaching method had an average achievement of 63.315
(63.25 + 63.38/2). The first teaching method, therefore, did not achieve the same level of results for students
as teaching methods 2 and 3.

110
MANOVA Example: Factorial Design
A research design may include more than one group membership variable, as was the case in the previous
example. The general notation is Factor A, Factor B, and Interaction A * B in a fixed effects factorial analysis
of variance. This basic research design is an extension of the univariate design with one dependent variable to
the multivariate design with two or more dependent variables. If a research design has two groups (factors),
then the interest is in testing for an interaction effect first, followed by interpretation of any main effect mean
differences. Factorial MANOVA is used when a research study has two factors, for example, gender and
teaching method, with two or more dependent variables.

The important issues to consider when conducting a factorial MANOVA are as follows:

Two or more classification variables (treatments, gender, teaching methods).


Joint effects (interaction) of classification variables (independent variables)
More powerful tests by reducing error variance (within-subject SS)
Requires adjustment due to unequal sample sizes (Type I SS vs. Type III SS)

The last issue is important because R functions currently default to a Type I SS (balanced designs) rather than
a Type III SS (balanced or unbalanced designs). The Type I SS will give different results depending on the
variable entry order into the equation, that is, Y = A + B + A * B versus Y = B + A + A * B. Schumacker (2014,
pp. 321–322) provided an explanation of the different SS types in R.

A factorial MANOVA example will be given to test an interaction effect using the Stevens (2009, p. 215)
data set with a slight modification in the second column, which represents a class variable. For the first
teaching method, the class variable will be corrected to provide 3 students in one class and 3 students in
another class. The data values are boldfaced in the R code and output.

111
We now have three teaching methods (method—Factor A) and two class types (class—Factor B) with two
dependent variables (ach1 and ach2). The R commands to conduct the factorial multivariate analysis with the
summary statistics are listed.

112
The four multivariate summary statistics are all in agreement that an interaction effect is not present. The
main effect for teaching method was statistically significant with all four summary statistics in agreement,
while the main effect for classes was not statistically significant. The Hotelling–Lawley and Roy summary
values are the same because they are based on the product of the eigenvalues from the same matrix, BW-1.
The Wilks L is based on WT-1, while Pillai is based on BT-1, so they would have different values.

In MANOVA, the Type I SS will be different depending on the order of variable entry. The default is Type I
SS, which is generally used with balanced designs. We can quickly see the two different results where the SS
are partitioned differently depending on the variable order of entry. Two different model statements are given
with the different results.

113
The first model (fit.model1) has the independent variables specified as follows: method + class + method * class.
This implies that the method factor is entered first. The second model (fit.model2) has the independent
variables specified as follows: class + method + method * class. This implies that the class factor is entered first.
The Type I SS are very different in the output due to the partitioning of the SS. Both Type I SS results show
only the method factor statistically significant; however, in other analyses, the results could be affected.

We can evaluate any model differences with Type II SS using the anova() function. The R command is as
follows:

The results of the model comparisons indicate no difference in the model results (Pillai = 0). This is observed
with the p values for the main effects and interaction effect being the same. If the Type II SS were statistically
different based on the order of variable entry, then we would see a difference when comparing the two
different models.

The Type III SS is used with balanced or unbalanced designs, especially when testing interaction effects.
Researchers today are aware that analysis of variance requires balanced designs, hence reliance on Type I or

114
Type II SS. Multiple regression was introduced in 1964 with a formula that permitted Type III SS, thus
sample size weighting in the calculations. Today, the general linear model in most statistical packages (SPSS,
SAS, etc.) have blended the analysis of variance and multiple regression techniques with the Type III SS as
the default method. In R, we need to use the lm() or glm() function in the car package to obtain the Type III
SS. We can now compare the Type II and Type III SS for the model equation in R by the following
commands.

Note: The value −1 in the regression equation is used to omit the intercept term, this permits a valid
comparison of analysis of variance results.

The Type II and Type III SS results give different results. In Type II SS partitioning, the method factor was
first entered, thus SS(A|B) for method (Factor A), followed by SS(B|A) for class (Factor B), then SS(AB|B, A)
for interaction effect. In Type III SS partitioning, the SS(A|B, AB) for the method effect (Factor A) is
partitioned, followed by SS(B|A, AB) for class effect (Factor B). Many researchers support the Type III SS
partitioning with unbalanced designs that test interaction effects. If the research design is balanced with
independent (orthogonal) factors, then the Type II SS and Type III SS would be the same. In this
multivariate analysis, both main effects (method and class) are statistically significant when using Type III SS.

A researcher would typically conduct a post hoc test of mean differences and plot trends in group means after
obtaining significant main effects. However, the TukeyHSD( ) and plot( ) functions currently do not work
with a MANOVA model fit function. Therefore, we would use the univariate functions, which are discussed
in Schumacker (2014). The descriptive statistics for the method and class main effects can be provided by the
following:

115
116
Effect Size
An effect size permits a practical interpretation beyond the level of statistical significance (Tabachnick &
Fidell, 2007). In multivariate statistics, this is usually reported as eta-square, but recent advances have shown
that partial eta-square is better because it takes into account the number of dependent variables and the
degrees of freedom for the effect being tested. The effect size is computed as follows:
η2=1−Λ.

Wilks’s Lambda (⋀) is the amount of variance not explained, therefore 1−⋀ is the amount of variance explained,
effect size. The partial eta square is computed as follows:
Partialη2=1−Λ1/S,

where S = min (P, dfeffect); P = number of dependent variables and dfeffect is the degrees of freedom for the
effect tested (independent variable in the model).

An approximate F test is generally reported in the statistical results. This is computed as follows:
F=1−YY(df2df1),

where Y = ⋀1/S.

For the one-way MANOVA, we computed the following values: ⋀ = .40639 and approx F = 4.549, with df1 =
4 and df2 = 32. Y = ⋀1/S, where S = (P, dfeffect) = (2, 2) = 2, so Y = .6374.

The approx F is computed as follows:


ApproxF=1−YY(df2df1)=.3625.6374(324)=.5687(8)=4.549.

The partial η2 is computed as follows:


Partialη2=1−Λ1/S=1−(.40639)1/2=.3625.

Note: Partial η2 is the same as (1−Y) in the numerator of the approximate F test.

The effect size indicates that 36% of the variance in the combination of the dependent variables is accounted
for by the method group differences.

For the factorial MANOVA (Type III SS), both the method and class main effects were statistically significant,

117
thus each contributing to explained variance. We first compute the following values for the method effect: ⋀ =
.50744 and approx F = 7.7653, with df1 = 2 and df2 = 16. Y = ⋀1/S, where S = (P, dfeffect) = (2, 1) = 1, so Y =
.50744.

The approx F (method) is computed as follows:


ApproxF=1−YY(df2df1)=.492.507(162)=.97(8)=7.76.

The partial η2 is computed as follows:


Partialη2=1−Λ1/S=1−(.50744)1/1=.493.

The effect size indicates that 49% of the variance in the combination of the dependent variables is accounted
for by the method group differences.

For the class effect: ⋀ = .67652 and approx F = 3.8253, with df1 = 2 and df2 = 16. Y = ⋀1/S, where S = (P, dfeffect)
= (2, 1) = 1, so Y = .67652.

The approx F (class) is computed as follows:


ApproxF=1−YY(df2df1)=.323.676(162)=.477(8)=3.82.

The partial η2 is computed as follows:


Partialη2=1−Λ1/S=1−(.6765)1/1=.323.

The effect size indicates that 32% of the variance in the combination of the dependent variables is accounted
for by the class group differences.

The effect sizes indicated 49% (method) and 32% (class), respectively, for the variance explained in the
combination of the dependent variables. The effect size (explained variance) increased when including the
independent variable, class, because it reduced the SS error (amount of unknown variance). The interaction
effect, class:method, was not statistically significant. Although not statistically significant, it does account for
some of the variance in the dependent variables. Researchers have discussed whether nonsignificant main
and/or interaction effects should be pooled back into the error term. In some situations, this might increase
the error SS causing one or more of the remaining effects in the model to now become nonsignificant. Some
argue that the results should be reported as hypothesized for their research questions. The total effect was 49%
+ 32% or 81% explained variance.

118
119
Reporting and Interpreting
When testing for interaction effects with equal or unequal group sizes, it is recommended that Type III SS be
reported. The results reported in journals today generally do not require the summary table for the analysis of
variance results. The article would normally provide the multivariate summary statistic and a table of group
means and standard deviations for the method and class factors. The results would be written in a descriptive
paragraph style. The results would be as follows:

A multivariate analysis of variance was conducted for two dependent variables (achievement1 and
achievement2). The model contained two independent fixed factors (method and class). There were three levels
for method and two levels for class. Student achievement was therefore measured across three different
teaching methods in two different classes. The assumptions for the multivariate analysis were met, however,
the two dependent variables were highly correlated (r = .99); for multivariate normality, Shapiro–Wilk =
0.9493, p = .07; and for the Box M test of equal variance–covariance matrices, chi-square = 4.17, df = 6, p =
.65. The interaction hypothesis was not supported; that is, the different teaching methods did not affect
student achievement in the class differently (F = 2.45, df (2, 16), p = .12). The main effects for method and class
however were statistically significant (F = 7.77, df (2, 16), p = .004 and F =3.83, df (2, 16), p = .04,
respectively). The partial eta-squared values were .49 for the method effect and .32 for the class effect, which
are medium effect sizes. The first teaching method had much lower joint mean differences in student
achievement than the other two teaching methods. The first class had a lower joint mean difference than the
second class (see Table 4.3).

120
Summary
This chapter covered the assumptions required to conduct multivariate analysis of variance, namely,
independent observations, normality, and equal variance–covariance matrices of groups. MANOVA tests for
mean differences in three or more groups with two or more dependent variables. A one-way and factorial
design was conducted using R functions. An important issue was presented relating to the Type SS used in the
analyses. You will obtain different results and possible nonstatistical significance depending on how the SS is
partitioned in a factorial design. The different model fit criteria (Wilks, Pillai, Hotelling–Lawley, Roy),
depending on the ratio of the SS, was also computed and discussed.

The eta square and partial eta square were presented as effect size measures. These indicate the amount of
variance explained in the combination of dependent variables for a given factor. The results could vary slightly
depending on whether nonsignificant variable SS is pooled back into the error term. The importance of a
factorial design is to test interaction, so when interaction is not statistically significant, a researcher may rerun
the analysis excluding the test of interaction. This could result in main effects not being statistically
significant.

121
Exercises
1. Conduct a one-way multivariate analysis of variance
a. Input Baumann data from the car library

b. List the dependent and independent variables

Dependent variables: post.test.1; post.test.2; and post.test.3

Independent variable: group

c. Run MANOVA using manova() function

Model: cbind(post.test.1; post.test.2; and post.test.3) ~ group


d. Compute the MANOVA summary statistics for Wilks, Pillai, Hotelling–Lawley, and Roy
e. Explain the results
2. Conduct a factorial MANOVA
a. Input Soils data from the car library
b. List the dependent and independent variables in the Soils data set
c. Run MANOVA model using the lm() function.
i. Dependent variables (pH, N, Dens, P, Ca, Mg, K, Na, Conduc)
ii. Independent variables (Block, Contour, Depth)
iii. MANOVA model: ~ Block + Contour + Depth + Contour * Depth − 1
d. Compute the MANOVA summary statistics for Wilks, Pillai, Hotelling–Lawley, and Roy
e. Explain the results using describeBy() function in psych package
3. List all data sets in R packages.

122
Web Resources
Hotelling T2

http://www.uni-kiel.de/psychologie/rexrepos/posts/multHotelling.html

Quick-R website

http://www.statmethods.net

123
References
Glass, G., Peckham, P., & Sanders, J. (1972). Consequences of failure to meet assumptions underlying the
fixed effects analysis of variance and covariance. Review of Educational Research, 42, 237–288.

Olson, C. L. (1976). On choosing a test statistic in multivariate analysis of variance. Psychological Bulletin,
83(4), 579–586.

Rummel, R. J. (1970). Applied factor analysis. Evanston, IL: Northwestern University Press.

Schumacker, R. E. (2014). Learning statistics using R. Thousand Oaks, CA: Sage.

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological
Bulletin, 86(2), 420–428.

Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). New York, NY:
Routledge (Taylor & Francis Group).

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston, MA: Allyn &
Bacon, Pearson Education.

124
5 Multivariate Analysis of Covariance

Assumptions
Multivariate Analysis of Covariance
MANCOVA Example
Dependent Variable: Adjusted Means
Reporting and Interpreting
Propensity Score Matching
Summary
Web Resources
References

The ANCOVA technique adjusts group means for the influence by other variables not controlled in the
study, which are called extraneous variables. The extraneous variables are assumed to influence variation in the
dependent variable and therefore controlled by statistical adjustment, since not controlled by random
assignment. Random assignment in experimental research designs control for bias in subject selection and
other threats to the internal validity of the research design, which is not present in quasiexperimental and
other types of nonexperimental research designs (Campbell & Stanley, 1966). The ANCOVA assumptions
are more stringent than the ANOVA assumptions.

125
Source: http://www.york.ac.uk/depts/maths/histstat/people/cochran.gif

William Gemmell Cochran (July 15, 1909, to March 29, 1980) was born in Scotland and spent much of his career in the United
States. He attended Glasgow University, receiving an MA degree in 1931, and attended Cambridge next, but never received a
doctorate, choosing instead to follow Frank Yates to the Rothamsted Experimental Station. Later, during the end of his career, he
did receive honorary doctoral degrees from the University of Glasgow in 1970 and Johns Hopkins University in 1975. He was
influenced by John Wishart (Wishart distribution), as well as R. A. Fisher (experimental design) and Frank Yates (Yates correction
factor in chi-square), with whom he worked at the Rothamsted Experimental Station, the United Kingdom. W. G. Cochran also
worked with George Snedecor and Gertrude Cox at the University of Iowa, and taught courses in experimental design and sample
survey. His books Experimental Design (1950), Sampling Techniques (1953), and Statistical Methods (1967) with these colleagues were
the prominent textbooks of the time period. He eventually ended up in the Department of Statistics at Harvard University, in 1957
and retired as professor emeritus in 1976. He received many awards during his career, including two from the American Statistical
Association. He was editor of the Journal of the American Statistical Association from 1945 to 1950. His many contributions to the
field of statistics also included the use of data transformations, analysis of variance with percents (dependent variable), analyses of
matched sample data, goodness of fit tests, and issues related to the chi-square test developed by Karl Pearson (Anderson, 1980;
Dempster & Mosteller, 1981; Watson, 1982).

William Gemmell Cochran (1934) was recognized for his distribution of quadratic forms in a random normal system with
applications to analysis of covariance (ANCOVA). His Cochran theorem was expanded to show that ANOVA can be extended to

126
situations requiring adjustment for covariate variables. He therefore postulated analyzing adjusted means in ANCOVA. His applied
work in this area was from his agriculture experimental design work at Rothamsted, where he addressed the practical concerns of
farmers and breeders. He further addressed problems in biomedical research with the development and use of clinical trials and
development of research protocols.

127
Assumptions
The ANOVA assumptions are listed below, and when not met, alternative approaches have been suggested
(Lomax & Hahs-Vaughn, 2012, pp. 309–331).

1. Observations are independent of each other


2. Homogeneity of variance (population variances of groups are equal)
3. Normal distribution of dependent variable(s)

ANCOVA requires the following additional assumptions:

4. Dependent variable continuous measure and fixed factor independent group variable
5. Relation between dependent and independent variables are linear
6. Covariate variables and independent variables are not related
7. The regression line for the groups are parallel
8. Homoscedasticity of regression slopes

The continuous dependent variable is required to calculate means. The fixed factor indicates exclusive group
membership categories. The linearity assumption can be assessed by visual inspection of scatter plots and the
Pearson correlation of X and Y. There are nonlinear ANCOVA methods, but these are not covered in this
book (Huitema, 1980). The covariate variables should be related to the dependent variable and not to the
independent variable (group). If the regression lines are not parallel for each group, then separate regression
lines should be used for each group for prediction. Generally, this assumption is not checked, and a common
regression line is fit for all the data with the common slope (beta weight) used for computing the adjusted
means. To check whether lines are parallel for each group, introduce an interaction term in the model
statement: Posttest = Group + Pretest + Group * Pretest. The Group term would test if groups had different
intercepts, Pretest would yield a common slope value, and the interaction term (Group * Pretest) would test if
the group regression lines were parallel. To check whether the variance around the regression line is the same
for groups (homoscedasticity), we would compare the mean square error (MSE) from the separate group
regression analyses. The basic ANCOVA procedures for computing separate regression equations and a
common regression equation when assumptions are met have been presented in numerous multiple regression
textbooks, for example, Pedhazur (1997).

128
Multivariate Analysis of Covariance
The use of covariate variables to adjust means is linked to two basic research design objectives: (1) eliminate
systematic bias and (2) reduce the within-group error SS. The best way to address systematic bias is to use
random sampling techniques; however, intact designs by definition are not formed using random sampling.
For example, students who qualify for the Head Start program would be considered an intact group. When
random assignment is not possible, then covariate adjustment of the means helps reduce systematic bias
(intact groups that differ systematically on several variables). The within-group SS is due to individual
differences among the subjects in a group. This can be addressed by selecting more homogeneous groups of
subjects, using a factorial design with blocking on key variables, using repeated measures ANOVA, or using
covariate variables to adjust group means. The purpose of MANCOVA is to adjust post means for initial
differences in groups (generally based on pretest measures of intact groups, where random selection and
random assignment to groups was not possible).

ANCOVA techniques combine ANOVA and multiple regression. ANOVA would test for mean differences
(intercepts), while the multiple regression technique would provide a common slope to compute adjusted
group means. MANCOVA is an extension of ANCOVA, where extraneous variables that affect the
dependent variables are statistically controlled, that is, the dependent variable means are adjusted. The
adjustment of dependent variable means in different groups, given a single covariate, is computed as follows:
Y¯j(adj)=Y¯j−bw(X¯j−X¯),

where Y¯j(adj) = adjusted dependent variable mean in group j, Y¯j = dependent variable mean before

adjustment, bw = common regression coefficient in entire sample, X¯j = mean of covariate variable for

group j, and X¯ = grand mean of covariate variable (covariate variable mean for entire sample). Obviously, if
the covariate means of each group are the same, then no adjustment to the dependent variable would occur,
that is, groups are initially equal prior to any treatment or intervention in the research design.

129
MANCOVA Example
MANCOVA extends the univariate ANCOVA to include more than one dependent variable and one or
more covariate variables. The null hypothesis in MANCOVA is that the adjusted population means of the
dependent variables are equal. This is tested with Wilks’s ⋀. A basic example with two dependent variables,
two groups, and one covariate variable is presented using data from Stevens (2009, p. 302). The two
dependent variables are posttest scores (Postcomp and Posthior), groups (male = 1, female = 2), and covariate
variable (Precomp).

We would first install and load the necessary packages to conduct the various analyses. Next, we input the
data for the two groups into matrices, which are then combined into a data frame with variable labels. The
data set, mancova, is attached so that the variable names can be used in the manova() function. The R
commands are specified as follows:

130
The MANCOVA with the ANOVA summary table for Wilks’s ⋀ and Type III SS is run on the data set. The
R commands are as follows:

Note: Stevens (2009) ran separate models, thus degrees of freedom differed. I ran a single model with the
results.

You can run the other summary commands to obtain the Pillai, Hotelling–Lawley, and Roy values. These

131
statistics will have the same values as Wilks’s ⋀ because of specifying Type III SS. Also, the order of entry for
the variables will not affect the partitioning of the SS. Recall that Type I SS would yield different results due
to variable entry order.

The findings indicated that the interaction effect was nonsignificant. Therefore, the assumption of parallel
slopes holds, that is, the two groups have the same linear relation between the dependent variables and the
pretest variable. The group means on the joint dependent variables were statistically significantly different (F
= 6.76, df = 2, 24, p = .005). However, the covariate variable was also statistically significant. This indicated
that the two groups had significantly different pretest means on Precomp, thus the two groups did not start out
the same. The fact that the two groups were initially different forms the basis for us wanting to adjust the
posttest means of the dependent variables by including the pretest variable in the model.

132
Dependent Variable: Adjusted Means
The manova() function with the pretest variable tests the adjusted means of the dependent variable. We can
run the lm() function to obtain the regression slope values for an equation to compute the adjusted means, but
it is easier to use the aov() function. To see the original dependent variable means, use the describeBy()
function in the psych package. The R command for the original dependent variable means is given as follows:

The ANCOVA summary table, aov() function, for the two dependent variables using just the pretest variable
and group membership variable are listed. The effect() function for adjusted means of each dependent variable
is run after each ANCOVA. The R commands for each are listed below with their corresponding output.

133
The two separate ANOVA tables indicate that both dependent variables are contributing to the overall
multivariate significance. It also helps our understanding of how the two dependent variables interact with the
pretest variable. Postcomp group mean differences were statistically significant with a statistically significant
pretest, Precomp. Posthior group mean differences were statistically significant, but there was no significant
pretest difference. In MANCOVA, these two different ANOVA findings are taken together to yield

134
significant group posttest adjusted mean differences.

MANCOVA tests the differences in the adjusted posttest means. It helps compute the original dependent
variable means and compare them with the adjusted dependent variable means.

The R commands to compute the posttest means, standard deviations, and pretest means for each group and
the entire sample are shown below.

The descriptive statistics for the two dependent variables for each group can now be summarized together.
Table 5.1 presents the original dependent variable means and the adjusted dependent variable means.

The separate ANCOVA results indicated that the pretest related differently with each of the dependent
variables. The correlation between Postcomp and Precomp was r = .764, which was statistically significant. The
correlation between Posthior and Precomp was r = − .1496, which was not statistically significant. To obtain the
different correlations between the covariate variable and each dependent variable use the following R

135
commands.

A graph of the relation between the covariate and each dependent variable can be viewed using the following
R commands.

136
The covariate (Precomp) is significantly correlated with the Postcomp variable (r = .76, p < .001), but not the
Posthior variable (r = − .15, p = .44). The graphs visually display the pretest scores relation with the Postcomp
and Posthior scores. From a design perspective, this could be a mismatched situation. Each dependent variable
would normally have its own pretest measure, so Precomp would not be used to adjust means for Posthior.

137
Reporting and Interpreting
The MANCOVA technique should meet all the assumptions of the MANOVA technique and report that
the additional assumptions for the MANCOVA technique were met. A basic write-up for reporting
MANCOVA results would be as follows:

The dependent variables were continuous, linear, and normally distributed variables with equal variance–
covariance matrices between the groups; thus, met the MANOVA assumptions. In addition, the dependent
and covariate variables were linear, and the two groups had parallel lines with homoscedasticity, thus had
equal slopes and variances, which met the additional assumptions for MANCOVA. This was indicated by a
nonsignificant group by pretest interaction (F = 1.90, df = 2, 24, p = .17). The covariate variable was
statistically significant (F = 22.79, df = 2, 24, p < .001), which indicates that the groups were initially different
on the pretest, thus requiring adjustment to the posttest means. The groups were statistically different on the
adjusted posttest means (F = 6.76, df = 2, 24, p = .004). Females had higher dependent variable posttest means
than males.

The Stevens (2009) data set was chosen because it points out the difficulty in meeting the MANCOVA
assumptions, which are in addition to the MANOVA assumptions (not shown). The example showed the
importance of conducting univariate F tests for each dependent variable and covariate variable. The results
indicated that Precomp was correlated with Postcomp, but not with Posthior; groups were different on Postcomp
scores and Precomp was a significant pretest; and groups were different on Posthior scores, but no significant
pretest was indicated, thus the two univariate analyses had different results. The multivariate analysis
combines the individual variable effects; thus, sometimes it can mask the different univariate results.

It is difficult to meet the ANCOVA assumptions, yet researchers continue to use the technique despite
violating the assumptions. On the surface, the statistical control for pretest differences falls short. Researchers
have sought other methods when unable to conduct an experimental design with random assignment to
control for threats to internal validity (Campbell & Stanley, 1966). Matching or blocking on key variables has
been recommended, which aids in the selection of similar subjects for a comparison group.

Critics of ANCOVA point out drawbacks to making statistical adjustments to means over random
assignment of subjects to groups. Two issues cited were that the inclusion of covariate variables changes the
criterion variable (dependent variable) such that the adjusted means change the construct (Tracz, Nelson,
Newman, & Beltran, 2005), and the adjusted means technique does not match the research question of
interest, but propensity score analysis with unadjusted posttest scores will (Fraas, Newman, & Pool, 2007). I
therefore turn my attention to the propensity score method.

138
Propensity Score Matching
In experimental research designs, random assignment would control for bias in subject selection and other
threats to internal validity; however, in nonexperimental research designs, matching subjects on the covariate
variable(s) is generally recommended rather than statistical adjustment to the means. Propensity score
methods have been advocated in place of previous matching or blocking methods (D’Agostino, 1918).
Propensity score matching (PSM) uses covariate variables to obtain a matched sample of subjects (Ho, Imai,
King, & Stuart, 2007). There are different PSM methods, so a researcher should exercise care in using PSM
(Schumacker, 2009). The R software has propensity score packages available (McCaffrey, Ridgeway, &
Morral, 2004—R twang package with mnps() function; Ho, Imai, King, & Stuart, 2007—R MatchIt package
with matchit() function to run various types of propensity score methods).

An SPSS data set with freshman students at a southern university was used to select a matching sample
(International Baccalaureate Organization [IBO], 2014). The data consisted of entering freshman students in
2007 and included gender, race, ethnicity, graduation status, and grade point averages for the 2007 to 2010
academic years. The researcher wanted to test GPA (grade point average) mean difference between AP
(Advanced Placement) and IB (International Baccalaureate) students across the 2007 to 2010 academic years,
however, the number of AP students outnumbered the IB students at the university. Specifically in 2007,
there were n = 279 IB freshman students compared with n = 6,109 AP freshman students at the university.

Propensity score analysis was conducted to select a matching group of 279 AP freshman students at the
university (Austin, 2011; Guo & Fraser, 2014; Holmes, 2014). In the study, gender, race, and graduation
status were used as covariates when selecting a matching group of AP freshman students. R software was used
with the MatchIt package using the “nearest neighbor” selection criteria with the covariates (http://www.r-
project.org/). The R script to read in the SPSS data file, select a matching group of students, then write out
the IDs to a file is given below. The file of IDs were then used in SPSS to select the matching AP students.
The total number of freshman students was N = 558 (IB = 279 students; AP = 279 students). The R script file
commands were as follows:

139
In PSM, it is important to check that the two samples are equivalent on the covariate variables used in the
matching process: Did PSM achieve similar numbers of AP students across gender, race, and graduation
completion? A chi-square analysis of status by gender, race, and graduation are presented in Tables 5.2, 5.3, and
5.4, respectively. Table 5.2 indicates the cross-tabulation of AP and IB students with gender (χ2 = 1.54, p =
.21). Table 5.3 indicates the cross-tabulation of AP and IB students with race (χ2 = 5.27, p = .15). Table 5.4
indicates the cross-tabulation of AP and IB students with graduation (χ2 = .23, p = .62). The chi-square
statistics for all the propensity score analyses were nonsignificant, which indicated that the PSM did provide a
matching number of AP to IB freshman students across the covariate variables.

The ability to obtain a matched sample of subjects permits statistical analysis of mean differences on
dependent variables without having to meet the assumptions in ANCOVA. It also doesn’t change the
construct or test the wrong hypothesis by using adjusted means. Overall, the matching of subjects provides a
sound research design option that does not involve statistical adjustments to means.

140
141
Summary
MANCOVA combines the approach of testing mean differences with the multiple regression approach of
estimating slope, or rate of change. Basically, the dependent variable means are adjusted based on the
correlation relation of one or more covariate variables. The intent is to statistically adjust for group pretest
differences, thus equating groups at the beginning of a research design. This statistical adjustment of the
dependent variable means has been scrutinized because it changes the meaning of the dependent variable. In
different disciplines, the research design doesn’t permit the random selection and assignment to groups due to
intact groups; thus, alternative methods have been advocated. Recently, the PSM approach has been
advocated to select a matching set of subjects based on the set of similar values on covariate variables. In
practice, the random selection and random assignment of subjects to experimental and control groups is the
gold standard to control for threats to internal validity.

142
Web Resources
Introduction to Propensity Score Matching—UseR! 2013 Conference

http://jason.bryer.org/talks/psaworkshop.html

Software for Propensity Score Matching

http://www.biostat.jhsph.edu/~estuart/propensityscoresoftware.html

Video on Propensity Score Matching Using R

http://www.youtube.com/watch?v=Z8GtYGESsXg

143
References
Anderson, R. L. (1980). William Gemmell Cochran 1909–1980: A personal tribute. Biometrics, 36,
574–578.

Austin, P. C. (2011). An introduction to propensity score methods for reducing the effects of confounding in
observational studies. Multivariate Behavioral Research, 46(3), 399–424.

Campbell, D. T., & Stanley, J. C. (1966). Experimental and quasi-experimental designs for research. Boston,
MA: Houghton Mifflin.

Cochran, W. G. (1934). The distribution of quadratic forms in a normal system with applications to analysis
of covariance. Proceedings of Cambridge Philosophical Society, 30(2), 178–191.

D’Agostino, R. B. (1918). Tutorial in biostatistics: Propensity score methods for bias reduction in the
comparison of a treatment to a non-randomized control group. Statistics in Medicine, 17, 2265–2281.

Dempster, A. P., & Mosteller, R. (1981). In Memoriam. William Gemmell Cochran 1909–1980. The
American Statistician, 35(1), 38.

Fraas, J. W., Newman, I., & Pool, S. (2007). The use of propensity score analysis to address issues associated
with the use of adjusted means produced by analysis of covariance. Multiple Linear Regression Viewpoints,
33(1), 23–31.

Guo, S., & Fraser, M. W. (2014). Propensity score analysis: Statistical methods and applications (2nd ed.).
Thousand Oaks, CA: Sage.

Ho, D., Imai, K., King, G., & Stuart, E. (2007). Matching as nonparametric preprocessing for reducing
model dependence in parametric causal inference. Political Analysis, 15(3), 199–236.

Holmes, W. M. (2014). Using propensity scores in quasi-experimental designs. Thousand Oaks, CA: Sage.

Huitema, B. E. (1980). The analysis of covariance and alternatives. New York, NY: Wiley.

International Baccalaureate Organization. (2014). Final report: A comparison of IB and non-IB incoming
freshman students. New York, NY: Author.

144
Lomax, R. G., & Hahs-Vaughn, D. L. (2012). An introduction to statistical concepts (3rd ed.). New York,
NY: Routledge (Taylor & Francis Group).

McCaffrey, D., Ridgeway, G., & Morral, A. (2004). Propensity score estimation with boosted regression for
evaluating adolescent substance abuse treatment. Psychological Methods, 9(4), 403–425.

Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation and prediction (3rd ed.).
Orlando, FL: Harcourt Brace College.

Schumacker, R. E. (2009). Practical issues to consider before using propensity score analysis. Multiple Linear
Regression Viewpoints, 35(2), 1–3.

Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). New York, NY:
Psychology Press.

Tracz, S. M., Nelson, L. L., Newman, I., & Beltran, A. (2005). The misuse of ANCOVA: The academic and
political implications of Type VI errors in studies of achievement and socioeconomic status. Multiple
Linear Regression Viewpoints, 31(1), 19–24.

Watson, G. S. (1982). William Gemmell Cochran 1909–1980. Annals of Statistics, 10, 1–10.

145
6 Multivariate Repeated Measures

Assumptions
Advantages of Repeated Measure Design
Multivariate Repeated Measure Example
Single Dependent Variable
Sphericity
Several Dependent Variables: Profile Analysis
Graphical Display: Parallelism
Difference Scores
Doubly Multivariate Repeated Measures
lmer4 Package
Reporting and Interpreting Results
Summary
Exercises
Web Resources
References

There have been many prominent scholars in various disciplines who have contributed to multivariate
statistics: Charles S. Pierce, Benjamin Gompertz, Herman O. A. Wold, Calyampudi R. Rao, George E. P.
Box, David Cox, Arnold Zellner, and Theodore W. Anderson, to name only a few. I have chosen to highlight
Theodore W. Anderson who specialized in the analysis of multivariate data.

Multivariate repeated measures extends MANOVA to research designs where the dependent variables are
repeated measurements. There are several possible research designs that involve different configurations of the
repeated measurements. First, one dependent variable can be measured several different times. For example, a
teacher measures student progress in algebra by administering an algebra exam at the beginning of the
semester, middle of the semester, and end of the semester. A second research design, could have several
different dependent variables all measured at the same time. For example, the Minnesota Multiphasic
Personality Inventory (MMPI) has 10 scales (hypochondriasis, depression, hysteria, psychopathic deviate,
masculinity/femininity, paranoia, psychasthenia, schizophrenia, hypomania, and social introversion), where
the scale scores represent the multiple measures of the dependent variable. The researcher might be interested
in testing whether a sample of men and women differ across the 10 scales. A third research design, doubly
multivariate repeated measures, has several different dependent variables, not all measured on the same scale,
but measured at several different times. For example, a teacher could measure reading achievement, math
achievement, and algebra exam scores at three different time points in the semester to assess differences in
male and female students. The nature of the repeated measurement of the dependent variables defines the
different approaches taken in the analysis. This chapter will not explore the deeper realm of time-series
analysis in business and econometric models, rather the applications used in the social sciences.

146
http://statweb.stanford.edu/~ckirby/ted/

Theodore W. Anderson (June 5, 1918, to present) was born in Minneapolis, Minnesota. In 1937, he received his AA degree from
North Park College. In 1939, he received his BS degree from Northwestern University. He received his MA in 1942 and PhD in
1945, both from Princeton University. From 1947 to 1948, Ted received a Guggenheim Fellowship and studied at the University of
Stockholm and the University of Cambridge. From 1946 to 1966, he was a faculty member at Columbia University. In 1967, he
moved to Stanford University. In 1988, Dr. Anderson was named an emeritus professor of statistics and emeritus professor of
economics. He served as editor of the Annals of Mathematical Statistics from 1950 to 1952. He was elected President of the Institute
of Mathematical Statistics in 1963. He wrote his popular book An Introduction to Multivariate Analysis in 1958, which is currently
titled, Introduction to Multivariate Statistical Analysis (Anderson, 2003).

147
148
Assumptions
The many assumptions discussed in the previous chapters, especially MANOVA, also apply when conducting
multivariate repeated measures. A researcher should screen the data and address issues related to outliers,
multicollinearity, linearity, and homogeneity of variance–covariance matrix. We have previously learned that
they dramatically affect statistical analysis. The three specific assumptions in multivariate repeated measures
are as follows:

Independence of observations
Multivariate normality
Sphericity

The lack of independent observations severely violates achieving valid inferences in statistical results. A
random sample of independent observations is critical for most statistical methods. MANOVA, hence
multivariate repeated measures, is robust to violations of multivariate normality. However, when you have an
extreme case of unequal variance–covariance matrices between groups, unequal sample sizes, and small sample
sizes, the impact is most profound. In repeated measures, we also need to be sensitive to the correlation
among the repeated measures, that is, sphericity, which has been referred to as circularity. Sphericity requires
that the variance of the differences in pairs of repeated measures be equal. Sphericity should not be confused
with compound symmetry, which is a more stringent condition for conducting multivariate repeated measures
when the researcher assumes that the population variances and covariances are equal.

Box (1954) indicated that if the sphericity assumption is not met, the F test is biased. Huynh and Feldt
(1970) further indicated that sphericity was a necessary condition for the F test to be accurate at the specified
alpha level. Greenhouse and Geisser (1959) computed Ɛ to indicate the extent to which a variance–covariance
matrix deviated from sphericity. When Ɛ = 1, sphericity is met; that is, the variance of differences in pairs of
observations are equal. The correction for lack of sphericity has been to adjust the degrees of freedom. The
Greenhouse–Geisser correction is considered conservative, thus it underestimates Ɛ, while the Huynh–Feldt
correction overestimates Ɛ. Stevens (2009) recommends using the estimate of sphericity to compute Ɛ(k − 1)
and Ɛ(k − 1)/(n − 1) degrees of freedom, or take the average of the Greenhouse–Geisser and Huynh–Feldt
values, and he strongly recommended against using the Mauchly test of sphericity.

149
Advantages of Repeated Measure Design
Sphericity is generally not a concern in multivariate repeated measures but is prominent in univariate repeated
measures designs. This is mainly due to the adjustment made to the degrees of freedom. However, a
researcher should be cognizant of the issues and assumptions that affect statistical analysis, given that
statistical inference is the goal of statistical analysis of data. Consequently, the more pitfalls a researcher can
avoid in analyzing the data, the better the inferential results.

An advantage of conducting repeated measures designs has been that subjects act as their own control in the
study. This translates into requiring smaller sample sizes and increased power due to a multivariate null
hypothesis. A researcher can explore different a priori research designs for different sample sizes and power
using G*Power 3 (Faul, Erdfelder, Lang, & Buchner, 2007).

For example, various results are listed giving effect size, alpha, power, number of groups, and number of
repeated measures for two different multivariate research designs: within subjects (subject differences in
repeated measures) and between and within subjects (group differences in repeated measures). Table 6.1
shows that sample size decreases when adding more repeated measures of subjects for the within- and
between-research designs, keeping other factors constant. Sample size increases when conducting a between-
and within-research design with interaction effects, due to sample size requirement for cell means. A
researcher today is not overly concerned about these small sample sizes given the numerous national data bases
with thousands of subjects.

150
151
Multivariate Repeated Measure Examples

152
Single Dependent Variable
A research design that measures subjects at three or more time points reflects the single dependent variable
multivariate repeated measures design. It is wise to have three repeated measures, otherwise you are simply
conducting a pre- and postdifference. The minimum of three repeated measurements is required to compute a
slope, which indicates rate of change.

The student data are typically entered into a data frame as follows:

When analyzing repeated measures data, the rectangular data file, student, needs to be converted to a person
period data set (Schumacker, 2014). The reshape package has a melt() function that easily creates the required
person–period data set for repeated measures research designs.

153
The research question of interest is whether students improved over the 6-week period—that is, whether
scores increased over time. The nlme package with the lme() function was used to analyze the repeated
measurement data on the 10 students who took 3 tests over a 6-week period.

The lme() function requires a group level membership variable, which is called variable in the data set. We
need to attach() the file to be able to use the names in the data set. Also, we need to declare a group level
membership variable using the factor() function.

The repeated measures is now analyzed by specifying the dependent variable, value, predicted by group
membership, variable, using the maximum likelihood estimation method.

154
The t values indicate a statistically significant difference from test1 to test2 (t = 6.79, df = 18, p < .0001), and
test2 to test3 (t = 12.33, df = 18, p < .0001). Notice the p value is listed as 0, which simply indicates that it has
numerous zero decimal places. You could have conducted two separate dependent (paired) t tests to determine
this difference, but the alpha would be inflated, therefore we have an advantage by conducting the repeated
measures technique (Maxwell, 1980). In conducting a multiple dependent t test, researchers would make a
correction to the alpha level, a Dunn–Bonferroni correction. When the p values are so extreme, as in this case,
the correction would not yield an alpha level that would indicate a nonsignificant t test.

Sphericity
The lme() function does not report sphericity. You can run the analysis using the ez package and
ezANOVA() function, which does provide sphericity tests.

155
The analysis of variance provides the F test (F = 76.08, df = 2, 18, p < .0001) for testing the statistical
significance of the three repeated measurements. The Greenhouse–Geiger (GGe) is .99 (p < .0001), so
sphericity is met. The Huynh–Feldt (HFe) is 1.27 (p < .0001), so it also indicates that sphericity was met.
Recall that the Greenhouse–Geisser correction is considered conservative, thus it underestimates Ɛ, while the
Huynh–Feldt correction overestimates Ɛ. The expected value is Ɛ = 1 when sphericity is met. The F test
confirms that the three test means increased over time.

We obtain the descriptive statistics for the three tests using the describeBy() function in the psych package
using the following R commands:

The student test means increased from 6.1 (test1) to 10.6 (test2) to 15.3 (test3). Notice we do not interpret
the variable, which indicates group membership. The describeBy() function is useful when many continuous
variables are present in a data set. In this situation, we could have obtained the means and standard deviations
by simply using the following R commands:

156
157
Several Dependent Variables: Profile Analysis
The multivariate repeated measures research design with multiple dependent variables is referred to as profile
analysis (Tabachnick & Fidell, 2007). The primary research question is whether groups have parallel profiles
(lines displaying the means of the dependent variables). The research questions can also involve whether one
group is higher than another group or whether all groups are similar on the dependent variables, which is
termed flatness.

The data set reported in Tabachnick and Fidell (2007, p. 317) was used to conduct the example profile
analysis. The hypothetical data represent three different occupational groups: belly dancers, politicians, and
administrators. The data represent their rating of four leisure activities on a scale of 1 to 10. The multiple
dependent variables (leisure activities) are read, dance, tv, and ski. The names represent the group membership
variable. The rep() function is used to repeat the group names 5 times. The data are entered as follows:

Graphical Display: Parallelism


A visual plot of the groups across the four leisure activities can be most helpful. To create the graph, we must
first install the ggplot2 package, then use a ggplot() function. The ggplot() function requires the data to be in a
person–period data set. We will need to make sure the following R packages are installed and loaded.

The person–period data set was created by entering the means in a vector along with the group names (belly,
politic, and admin) and the variable names (read, dance, tv, ski). These were put into the data frame, newfile.
This permits a simple plot of the means across the four leisure areas for the three groups. This visual display is
helpful in examining the group parallel profiles. The R commands that created the person–period data set

158
were as follows:

Note: To make things easier, the describeBy() function in the psych package was used to compute the means,
then entered in the matrix above. You can also create a person–period file using the melt() function in the
reshape or reshape2 packages. You will also need to declare names as a factor, that is a group membership
variable. The R commands would be as follows:

The administrator, politician, and belly dancer averages are plotted across the four leisure activities using the
ggplot2 package and ggplot() function. There are different ggplot() functions, but the geom_line() function
was used to draw lines on the graph. The aes() function specifies attributes you desire on the graph, for
example, I specified linetype = Group to draw different line types (color = Group; would give different colored
lines). The aes attributes are added in layers after the basic plot window is created using the ggplot() function.
The first command provides the data set and defines the x and y variables. The second command line provides
the different lines for each group and labels for the x and y axes. The R commands are as follows:

159
The graph of mean leisure activity clearly shows that the groups do not have parallel profiles. The lines for
each group connect their respective mean rating for each of the leisure activities. Belly dancers rated all of the
activities higher than either administrators or politicians. Administrators were in the middle across the four
leisure activities. The lines are also not flat; they tend to show differences in mean ratings.

Difference Scores
Maxwell and Delaney (2004) explained that sphericity and compound symmetry are controlled by analyzing k
− 1 difference scores between the dependent variables. Stevens (2009) also stated, “In the multivariate case for
repeated measures the test statistic for k repeated measures is formed from the (k − 1) difference variables and
their variances and covariances” (p. 418). Tabachnick and Fidell (2007), when presenting profile analysis,
further discussed the test of parallelism and flatness by indicating the use of difference scores. They further
point out that which dependent variables are used to create the difference scores is arbitrary. The test of
parallelism is conducted on the difference scores in a one-way MANOVA. The difference scores represent the
slope between the two dependent variables used to calculate the score. If the difference is statistically
significant between the groups, then the profiles are not parallel. Given our graphical display, we would expect
a statistically significant finding that the group profiles are not parallel.

The difference scores for the k = 4 leisure activities yielded k − 1 or 3 difference scores. The computed
difference scores were for read versus dance, dance versus tv, and tv versus ski (same as Tabachnick & Fidell,
2007) using the multdv data set. The R commands are as follows:

160
The one-way MANOVA can be conducted using the MASS package and the manova() function. The three
difference scores (RD, DT, TS) need to be put in a separate file, outcome. The names variable will need to be
declared a factor—that is, a group membership variable. So the R commands would be as follows:

The MANOVA provides several test statistics, which need to be output separately using the summary()
function (Pillai is the default value, but given here as well). The R commands are as follows:

161
The results for all four multivariate tests are statistically significant. Therefore, we can feel confident that the
groups do not have parallel profiles on the four leisure activities. The effect size is measured as a partial eta-
squared, which is computed using the Wilks’s Lambda value:
Partialη2=1−Λ1/2=1−(.076279)1/2=.72.

The partial eta-square indicated that 72% of the variance in the difference scores is accounted for by the
profiles of the three groups. Tabachnick and Fidell (2007) extend the problem to include a test of flatness,
which uses the combined means of the three groups on each set of difference scores, and tests the grand mean
vector via Hotelling T2. This is, essentially, a one-sample Hotelling T2.

162
Doubly Multivariate Repeated Measures
In the doubly multivariate repeated measures design, the different dependent variables are repeatedly
measured over time for two or more groups. It is considered doubly multivariate because each dependent
variable across time has a correlated affect due to measurement at the different time periods, and there is a
correlation between the dependent variables at each time period. For example, boys and girls are measured in
math and reading across Grades 3, 8, and 11. The math scores across the grade levels have correlated effects as
do the reading scores. Also, the math and reading scores are correlated at each grade level. A research question
testing whether boys and girls are the same in math and reading across the grade levels would be considered a
doubly multivariate repeated measures.

The time effect (when dependent variables are measured) has the assumption of sphericity. When conducting
a doubly multivariate analysis, sphericity assumption is removed. However, I think we are destined again to
use difference scores in the analysis. The sample size required for this type of design is usually based on the
between-subjects effect (group differences); however, I recommend selecting sample size based on the
possibility of an interaction effect and the number of repeated measures for the dependent variable (see Table
6.1).

The data set, dblmult.dat, in Tabachnick and Fidell (6th ed., ASCII file type) will be used. It can be
downloaded directly at http://www.pearsonhighered.com/tabachnick/. The data set was read directly from the
Internet. The variables names were included in the file (header=TRUE), and the data values were tab
delimited (sep = “\t”). The following R command read in the data set.

Next, we can obtain the means for each group on the dependent variables and put them into data vectors for
plotting. The psych package and the describeBy() function permit an easy way to obtain the means for the

163
intercept and slope values.

Next, we need to create the data vectors for the two groups, four sessions, and only the intercept means. The
Grp variable represents a data vector with four Gs and then four symbols. The session data vector contains the
number sequence 1 to 4, which is repeated for the second group. The Mns data vector contains the 8 intercept
means from the describeBy( ) function above. The three data vectors are combined into the data file, intcp.

This is the data file format that represents a person–period structure necessary for the ggplot() function,
which references the data file (intcp), and the x and y variables. The geom_line() function provides for
additional layers to be added to the graph. The aes() function provides for plotting lines of the groups and the
labels for the title of the graph and the x and y axes. The R commands are now given as follows:

164
My visual inspection of the two groups across the four angle sessions in Figure 6.1 indicates a nonparallel
finding in the multivariate repeated measures analysis. To confirm this, you can run the manova() function
with the four intercept variables in the mydata file.

Figure 6.1 Mean Intercept Across Angle Sessions

165
The MANOVA repeated measures results confirm that the two groups do not have a parallel profile across
the four angle sessions.

We will now repeat the process for the four slope variables to assess change in reaction time across the four
angle sessions. The changes include adding the slope means to a data vector; Grp and session are already saved
in the workspace, so they do not need to be respecified. The Mnslope data vector contains the slope means.
The ggplot() function inserts this new data file name, changes y = Mnslope, and changes labels accordingly.
The set of R commands are now given with the minor changes as follows:

Figure 6.2 Mean Slope Across Angle Sessions

166
The visual inspection in Figure 6.2 of the two groups across the four angle sessions is not clear on whether the
slope means are parallel. The multivariate repeated measures analysis should help confirm whether a parallel
group profile is present. We need to once again run the manova() function, but this time with the slope
variables in the mydata file.

167
The multivariate repeated measures of slope means for the two groups across the four different angle sessions
was not statistically significant. We would therefore conclude that a parallel profile exists for the two groups
across the four angle sessions for the slope means.

lmer4 Package
The lmer4 package provides an lmer() function for linear mixed models and an nlmer() function for nonlinear
mixed models. This permits the use of two or more factors in a person–period data set. The mydata file would
be converted to a person–period data set as follows:

168
We will need to add an id variable and a time variable to the data set using the following R commands.
Knowing the structure of the data set is helpful to provide the correct coding for id and time across the other
variables. We desire for the id to be listed for both slope and intercept, while time is coded across the sets of
slope and intercept values to reflect the four time periods.

169
We are now ready to run the lmer() function to test the group, time, and group * time interaction effects. The
group * time interaction effect is a test of parallelism—that is, equal slopes between the groups.

The analysis of variance results indicated that the group * time interaction effect was not statistically significant
(F = .2669, p = .61).

The lmer() function does not report the p values for the F tests. The p values can be calculated using the pf()
function. The F tests for the main and interaction effects are calculated as follows by inserting the
corresponding F values:

170
Results indicated that there was no statistically significant group difference and no group * time interaction
effect. The interaction effect combined both the dependent variables, so overall, we would conclude that the
groups had parallel slopes (test of parallelism).

The test of parallelism does not reflect the two separate tests conducted earlier but rather a combined effect.
We would need to create two separate data files, then run the analysis for each.

This is done easily by simply extracting the intercept values into one data file, then extracting the slope values
into a different file. The R commands to run the two separate multivariate repeated measures analyses with
factor variables are given as follows:

The F values are all statistically significant, indicating not only group and time differences but also different
intercepts across time (test of parallelism: groups had different profiles). This matches what we discovered
before and showed visually in a graph. We would therefore interpret the interaction effect.

171
The F values for the slope dependent variable indicated no group or group * time interaction effect. We would
therefore conclude parallel profiles for the groups (test of parallelism: same group profile). This matches what
we discovered and showed graphically before. We would be interpreting the main effect for time.

172
Reporting and Interpreting Results
The reporting of multivariate repeated measures results requires that a researcher specifies whether
assumptions were met (sphericity, equal variance–covariance matrices, multivariate normality, etc.), the type of
research design, descriptive statistics, whether planned contrasts were planned, graphical displays, and what
software package and procedure were used. This may seem cumbersome, but it provides much needed clarity
for the reader; and of course, please be sure and answer the research question! I will provide a brief attempt at
what you should report for the doubly multivariate analysis in the chapter.

A doubly multivariate repeated measures design was conducted to compare two groups (G vs. symbol) on
their average reaction time (intercepts) and change in reaction time (slope) across four angles of rotation.
Figure 6.1 displays the mean intercept values across the four angle rotations for the two groups. It visually
appears that the groups do not have parallel profiles—that is, a significant interaction effect. The lmer4
package in R was used to analyze a person–period data set. The combined effects of both dependent variables
was nonsignificant (F = .27, p = .61) when testing interaction effect. A separate analysis of the dependent
variables indicated different findings. For the intercept values, a statistically significant interaction effect was
present (F = 78, p < .0001). The groups had different profiles (nonparallel slopes) across the four angle
rotations. For the slope values, the interaction effect (F = .03, p = .87) and the main effect for group (F = .59, p
= .44) were not statistically significant. The main effect for time was statistically significant (F = 13.45, p =
.0003). The descriptive statistics reports the means and standard deviations of the two dependent variables
across the four angle rotations for the two groups.

173
Summary
Multivariate repeated measures is an extension of MANOVA with similar assumptions. The multivariate
repeated measures has the advantage of subjects being their own control, hence a smaller sample size is
generally required. Additionally, the multivariate method controls for an inflated Type I error rate, which
occurs when conducting multiple univariate tests, and yields more power. More important, the research design
with repeated measurements needs to be analyzed properly to assess change over time. I have presented some
basic research designs. The first research design indicated a single dependent variable repeated across time for
a group of subjects, which is a within-subjects design. The second research design indicated multiple
dependent variables referred to as profile analysis, where I used difference scores to control for sphericity. The
third research design indicated a doubly multivariate repeated measurement design where more than two
dependent variables and two factors were repeated, which included within- and between-subject variables and
an interaction term, which is a test of parallel slopes between groups. There are many variations to
longitudinal models and the analysis of change over time, which is beyond the scope of this book. I refer you
to the following books with expansive coverage of the topic: Collins and Horn (1991), Heck, Thomas, and
Tabata (2014), Singer and Willet (2003) to name only a few.

Multivariate analysis is conducted given the research design and the associated research question. The
calculation of descriptive statistics and the visual plotting of the means is helpful in understanding the
outcome of the statistical analysis. It is also customary after finding significant multivariate results that a
researcher would conduct univariate tests. These tests are referred to as simple effects, post hoc, or planned
comparisons. Specific contrasts can be hypothesized and tested (Schumacker, 2014). Although we did not
follow through with any univariate tests after the multivariate analysis, the textbooks referenced in the preface
provide examples and explanations of these additional types of tests.

174
Exercises
1. What are the three assumptions that should be met when conducting a multivariate repeated measures analysis?
2. What are the two advantages of multivariate repeated measures over conducting paired t tests?
3. Define sphericity.
4. Why are difference scores recommended in repeated measures analyses?
5. Given the following data set, ch5ex3.dat, conduct a multivariate repeated measures analysis using lmer4 package and lmer()
function.

Note: Download files from the Internet (*.zip file). Extract and use ch5ex3.dat

175
Web Resources
Kick Start R for Repeated Measures

http://cran.r-project.org/doc/contrib/Lemon-kickstart/kr_repms.html

Overview of Multivariate Statistical Methods in R

http://cran.r-project.org/web/views/Multivariate.html

176
References
Anderson, T. W. (2003). Introduction to multivariate statistical analysis (3rd ed.). New York, NY: Wiley.

Box, G. E. P. (1954). Some theorems on quadratic forms applied in the study of analysis of variance
problems: II. Effect of inequality of variance and correlation between errors in the two-way classification.
Annals of Mathematical Statistics, 25, 484–498.

Collins, L. M., & Horn, J. L. (1991). Best methods for the analysis of change. Washington, DC: American
Psychological Association.

Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis
program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.

Greenhouse, S. W., & Geisser, S. (1959). On methods in the analysis of profile data. Psychometrika, 24,
95–112.

Heck, R. H., Thomas, S. L., & Tabata, L. N. (2014). Multilevel and longitudinal modeling with IBM SPSS
(2nd ed.). New York, NY: Routledge (Taylor & Francis Group).

Huynh, H., & Feldt, L. S. (1970). Conditions under which mean square ratios in repeated measurement
designs have exact F distributions. Journal of the American Statistical Association, 65, 1582–1589.

Maxwell, S. E. (1980). Pairwise multiple comparisons in repeated measures designs. Journal of Educational
Statistics, 5, 269–287.

Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison
perspective (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.

Raykov, T., & Marcoulides, G. A. (2008). An introduction to applied multivariate analysis. New York, NY:
Routledge.

Schumacker, R. E. (2014). Learning statistics using R. Thousand Oaks, CA: Sage.

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis: Modeling change and event
occurrence. New York, NY: Oxford University Press.

177
Stevens, J. P. (2009). Applied multivariate statistics for the social sciences (5th ed.). New York, NY:
Routledge (Taylor & Francis Group).

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). New York, NY: Pearson
Education.

Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). New York, NY: Pearson
Education.

178
7 Discriminant Analysis

Overview
Assumptions
Dichotomous Dependent Variable
Box M Test
Classification Summary
Chi-Square Test
Polytomous Dependent Variable
Box M Test
Classification Summary
Chi-Square Test
Effect Size
Reporting and Interpreting
Summary
Exercises
Web Resources
References

179
http://infopages.com/api_images/socialmention/ronald_fisher/socialmention_ronald_fisher_654743.jpg

Sir Ronald Aylmer Fisher (February 17, 1890, to July 29, 1962) was born in London, England, and passed away at the age of 72 in
Adelaide, Australia. In 1909, Fisher attended Cambridge University on a scholarship. In 1912, he completed a bachelor’s degree in
astronomy and passed the Mathematics Tripos exam with the distinction of Wrangler. In 1914, he became a high school physics and
math teacher because he was unable to join the war efforts due to his poor eyesight. In 1918, he published a paper that clarified the
difference between Gregor Mendel’s genetic theory and Charles Darwin’s theory of evolution. He was the father-in-law of George
E. P. Box. In 1920, Fisher was offered the position of chief statistician under the auspices of Karl Pearson at the Galton Laboratory
and was also offered the opportunity to establish a statistical laboratory at Sir John Russell’s Rothamsted Agricultural Station, which
he accepted. His work at the agricultural station is famous for his study on experimental designs, the analysis of variance
methodology where he extended W. S. Gossett’s work at Guinness Brewery (student t test), and his concept of maximum likelihood
estimation, where he subsequently developed a range of multivariate methods to study the linkage of genes with different traits. In
1930, he presented his findings in a book, The Genetical Theory of Natural Selection. Fisher was acquainted with Pearson’s work from
his paper “Mathematical Contribution to the Theory of Evolution.” Fisher was also critical of Pearson’s work on the distribution of
the coefficient of correlation in samples from an infinite, normally distributed bivariate population. He sent Pearson an exact

180
solution for the distribution of the coefficient of correlation. This was the beginning of a bitter feud between them that lasted for
many years. Pearson published scathing articles in Biometrika criticizing Fisher’s work. The Royal Statistical Society began refusing
to publish Fisher’s papers, and he subsequently resigned in protest. Fisher, however, succeeded Pearson as the Galton professor of
eugenics at University College in London after Pearson’s retirement in 1933.

In 1936, Fisher was presented with a statistical problem by E. M. Martin, a naturalist, who was searching for a methodology that
would allow him to classify jaw bones recovered from a burial place as belonging to the categories of male or female. Fisher suggested
a dummy coded variable with a linear function of the jaw bone measurements that afforded maximum separation of the male and
female distributions. Fisher had formulated the methodology for the discriminant function (discriminant analysis). In 1939, Welch
showed that Fisher’s discriminant function was equivalent to the log likelihood ratio. In 1992, C. R. Rao, Fisher’s doctoral student at
Cambridge, extended his technique to more than two groups. Rao established the sufficiency of the discriminant function for testing
the discriminant function in the classification of an individual into polytomous groups (more than two groups).

Fisher held the Balfour Professor of Genetics at Cambridge from 1943 to 1957, received several awards, and wrote numerous articles
and several books. In 1952, Fisher was knighted Sir Ronald Aylmer Fisher for his distinguished scientific career.

181
Overview
The discriminant analysis involves research questions related to the classification of subjects into two or more
groups (dependent variable). In the case of two groups, multiple regression and discriminant analysis are
identical (Schumacker, Mount, & Monahan, 2002). When the classification accuracy of three or more groups
is desired with a set of independent variables, other fit statistics are reported than those provided in multiple
regression.

A linear discriminant equation can be expressed as Di = a + b1X1 + b2X2 +... bnXn. The discriminant weights,
bn, are chosen to maximize the difference in Di scores. The discriminant score (Di) is a value for each subject
that indicates the probability of group membership. The ratio of the between-groups SS to the within-groups
SS is an eigenvalue
λ=SSBSSW.

Wilks’s Lambda is used to test the null hypothesis that the population means on Di are equal. Wilks’s
Lambda is

Λ=SSwithin−groupsSStotal ,

so a smaller ⋀ would lead to a rejection of the null hypothesis because a significant amount of variance is
explained. Wilks’s Lambda is the variance not accounted for, so 1 − ⋀ is the variance explained. Discriminant
function analysis is mathematically equivalent to MANOVA, except that the group membership variable is
the predictor variable with the independent variables becoming a set of dependent variables. If MANOVA is
not significant, then the discriminant analysis would not provide a significantly different group membership
prediction. Given the similarity between discriminant function analysis and MANOVA, they have statistical
assumptions in common.

182
Assumptions
Discriminant function analysis is a parametric method that weights independent variables to predict group
classification. The most important assumptions are as follows:

The groups must be mutually exclusive and have equal sample sizes.
Discriminant function analysis is sensitive to outliers, so no outliers.
Groups should have equal variance–covariance matrices on independent variables.
The independent variables should be multivariate normally distributed.
The independent variables are not highly correlated (no multi collinearity).

Discriminant analysis is robust to a violation of normality when data are randomly sampled, sample sizes of
each group are large, and the groups have equal sample sizes. The Box M test can be run to determine if the
groups have the same variance–covariance matrices among the independent variables (Levene test would
suffice for two groups). Log or probit transformations of data should help make the data more normally
distributed. Discriminant analysis will perform better as sample size increases.

183
Dichotomous Dependent Variable
The following research question frames our use of discriminant analysis. Does knowledge of math and
English test scores permit the classification of students into at-risk and not-at-risk groups? The data set
contains a dichotomous dependent variable (group) with the two independent predictor variables (math and
english). The data set is created using the following R commands:

The discriminant function is lda() in the R MASS package. The R commands for the two group discriminant
function analysis would be as follows:

Results indicate that both groups have equal sample sizes—that is, equal prior probability. The group means
indicate a good separation (recall MANOVA), so the subjects’ Di scores for group classification should be
sufficiently different. Finally, the discriminant weights are given, which compute the Di scores. The linear
discriminant function equation would be Di = .63(math) − .11 (English).

184
185
Box M Test
The equality of the group variance–covariance matrices can be tested using the following boxM() function in
the biotools package. The R commands and output are as follows:

The results indicate that the two groups have similar variance–covariance matrices because of a statistically
nonsignificant Box M test (χ2 = 1.4799, df = 3, p = .6869).

186
Classification Summary
An essential feature of discriminant function analysis is the classification of subjects into the mutually
exclusive groups given the knowledge of independent variables. The R commands to produce the data for the
classification summary table are as follows:

We can now compute the percent correctly classified into at-risk and not-at-risk groups based on the math
and English scores. The R commands are as follows:

We can show the cell counts and the proportions using the following R commands:

187
188
Chi-Square Test
The discriminant function analysis does not provide a statistical test for the classification results. Therefore, I
recommend a chi-square test on the group membership and predicted classification data. The R command for
the chi-square test is as follows:

The chi-square results indicated a statistically significant classification result (χ2 = 5.33, df = 1, p = .02). The
classification of at-risk and not-at-risk group membership was statistically significant given the math and
english independent variables. To answer our research question, knowledge of math and English test scores
permits the classification of students into at-risk and not-at-risk groups.

189
Polytomous Dependent Variable
Discriminant analysis can be run when the dependent variable has more than two groups with a set of
independent variables. I refer to the dependent variable as being polytomous—that is, it contains more than
two mutually exclusive categories. The data set is from Field, Miles, and Field (2012, pp. 720–722) and
contains a dependent variable with three categories (cognitive behavior therapy, behavior therapy, and no
treatment) and two independent variables (actions and thoughts). The lda() function in the R package MASS
provides a linear discriminant analysis, which should be used with the prior argument that provides group
sizes. The predict() function provides the predicted probability of group membership. A statistical test of
actual versus predicted group membership can be accomplished by using the chisq.test() function—a chi-
square test of statistical significance. The R commands for the analyses would be the same. An example is
provided once again to illustrate the results.

190
Box M Test
If you have closed the R software, then you will need to once again install and load the biotools package. If not,
you can skip this step.

191
192
Classification Summary

193
Chi-Square Test

194
Effect Size
Discriminant analysis effect size can be interpreted by computing a canonical correlation; however, the basic
canonical correlation function, cancor() in the stats package does not provide a test of statistical significance.
So it is easier to compute canonical correlation and obtain the Bartlett chi-square test using the cca() and
summary() functions in the CCA package. The R commands would be as follows:

The first discriminant function had a canonical r = .30, with r2 = .09. The Bartlett chi-square test indicated a
nonsignificant squared canonical correlation value (r2 = .09, χ2 = 2.56, df = 2, p = .28).

Note: The effect size can also be reported as a partial eta-squared value, the same as in MANOVA. Recall,
partial eta-square is computed as 1 − ⋀1/S, using Wilks’s Lambda and S = min(p, dfeffect), as defined in Chapter
4. However, the lda() function does not provide the Wilks’s Lambda value. Therefore, you would run the data
in a MANOVA to obtain the values to compute partial eta-square.

195
Reporting and Interpreting
The goal of discriminant analysis is to predict group membership. Group membership (dependent variable)
can have two levels: smoker versus nonsmoker, or it can have three or more levels: cigar, cigarette, and
nonsmoker. The independent predictor variables are selected to maximize the group prediction or
classification accuracy. Beyond the classification accuracy, interpretation can include the discriminant function
that separates the group means. Therefore, the group means become part of the interpretation of results.
Finally, an effect size can be reported for an indication of practical importance. A write-up for the Field et al.
(2012) results might be as follows:

Three groups (cognitive behavior therapy, behavior therapy, no treatment) were distinguished by their actions
and thoughts. The three groups had equal prior probability (33%) or group membership (n = 10 per group).
The first linear discriminant function was Group = .603(actions) − .335 (thoughts) with 82% explained
variance. The actions independent variable had a lower behavior therapy mean (3.7) than either the cognitive
behavior therapy (4.9) or no treatment (5.0) groups. The thoughts independent variable had a lower cognitive
behavior therapy mean (13.4) than either the behavior therapy (15.2) or no treatment (15.0) groups. There
was only a 47% classification accuracy for the two independent variables, which was not statistically significant
(χ2 = 6.15, df = 4, p = .19). The effect size indicated a nonsignificant canonical r = .30 (Bartlett χ2 = 2.56, df =
2, p = .28). The discriminant analysis results were nonsignificant.

196
Summary
Sir Ronald Fisher was best known for his advancements in analysis of variance. Few researchers probably
know of his work in developing discriminant function analysis in the biological sciences. The weighting of
independent variables to predict group classification has been linked to multiple regression when the
dependent variable is dichotomous (Schumacker et al., 2002). The discriminant function analysis can be
extended to a dependent variable with more than two groups (polytomous dependent variable).

This chapter covered both the dichotomous and polytomous dependent variable applications for discriminant
function analysis. The ultimate goal is to achieve a high percentage of correctly classified subjects based on the
weighted independent variables. A chi-square test can provide the basis for determining if the percent
classification is statistically significant. In addition, canonical correlation yields an effect size measure, which
further aids in a practical interpretation of the classification results. The Bartlett chi-square test determines if
the squared canonical r value is statistically significant. A partial eta-square can also be computed when
running the discriminant analysis as a special case in MANOVA. In MANOVA, the independent variables
would become dependent variables, while the independent variable would now become the group membership
variable.

197
Exercises
1. List the basic assumptions one would require to run a robust discriminant analysis.
2. Explain the difference between MANOVA and discriminant function analysis.
3. Conduct a discriminant analysis.

Issue R command: > data() to see a list of available data sets in R

a. Select and attach amis data file.


b. Print the first 10 lines of data file.
c. Run a linear discriminant function analysis with period as dependent variable. The independent variables are speed and warning.
d. Output group prediction—put in data frame, view first 10 lines.
e. Assess the accuracy of prediction—total percent correct.
f. Show cell counts and proportions.
g. Calculate chi-square for classification accuracy.
h. Calculate effect size.
i. Interpret results

198
Web Resources
Online Free Multivariate Statistics Book

http://little-book-of-r-for-multivariate-analysis.readthedocs.org/en/latest/

Quick-R Discriminant Function Analysis Explanation

http://www.statmethods.net/advstats/discriminant.html

199
References
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. Thousand Oaks, CA: Sage.

Meyers, L. S., & Gamst, G., & Guarino, A. J. (2013). Applied multivariate research: Design and
interpretation (2nd ed.). Thousand Oaks, CA: Sage.

Schumacker, R. E., Mount, R. E., & Monahan, M. P. (2002). Factors affecting multiple regression and
discriminant analysis with dichotomous dependent variable: Prediction, explanation, and classification.
Multiple Linear Regression Viewpoints, 28(2), 32–39.

200
8 Canonical Correlation

Overview
Assumptions
R Packages
CCA Package
yacca Package
Canonical Correlation Example
Effect Size
Reporting and Interpreting
Summary
Exercises
Web Resources
References

201
Courtesy of the State Library of North Carolina

Harold Hotelling (September 29, 1895, to December 26, 1973) introduced the Hotelling T2 statistic in 1931 and the canonical
correlation in 1936 (Hotelling, 1936). R. A. Fisher provided the influence to cultivate his interest in statistics, which he later shared
with Henry Mann (nonparametric Mann–Whitney U statistic) and Abraham Wald (decision theory, statistical sequential analysis) at
Columbia University. He was an associate professor of mathematics at Stanford University from 1927 to 1931. He was a member of
the Columbia University faculty from 1931 to 1946. Hotelling is well-known for encouraging universities to create statistics
departments. In 1972, he received the North Carolina Award for contributions to science, and a street in Chapel Hill, North
Carolina, is named after him. He spent much of his career as a professor of mathematical statistics at the University of North
Carolina at Chapel Hill from 1946 until his death in 1973 at the age of 78.

202
203
Overview
Canonical correlation analyzes the relation among two sets of variables. Traditionally, the two sets have been
defined as dependent variables (Y) and independent variables (X). The purpose is to determine if the two sets
of variables are related (correlated). However, the two sets of variables are each interpreted as a dimension that
relates variables on one side to variables on the other side. For example, a set of student test scores (Y
variables) are related to student characteristics (X variables). The research question of interest is whether
student achievement (as measured by a set of student test scores) is related to student effort (as measured by a set
of student characteristics). Therefore, student achievement and student effort are the names given to the
dimensions that make up the Y and X variables. Canonical correlation is computed based on four correlation
matrices, which provide the individual correlations (Ryy and Rxx) and combined correlations (Ryx and Rxy).
The canonical correlation formula is as follows:
R=Ryy−1RyxRxx−1Rxy.

The set of variables on both sides of the equation in a canonical correlation can be combined in different ways
(dimensions). So there are different canonical variates, or linear combinations of the Y and X variables, that are
possible. The different linear combinations form pairs of canonical variates. In a canonical correlation analysis,
we need to determine how many significant canonical variate pairs are in the data set—that is, how many
dimensions are represented. The number of canonical variates is always one less than the largest number of Y
or X variables. The canonical variates are computed in descending order of magnitude, so the first canonical
variate solution will explain the most variance (squared canonical correlation coefficient). The squared
canonical correlation coefficient is an effect size measure, which indicates the amount of variance accounted
for by the two linear sets of variables. The canonical variate results are orthogonal, so each additional solution
is adding to the explained variance.

The canonical correlation coefficient for each canonical variate is tested for statistical significance to indicate
whether the two sets of variables (dimensions) are related. More than one canonical variate solution may be
statistically significant. A concern in canonical correlation analysis is the interpretation of the canonical variates.
This is akin to interpreting constructs (factors) in factor analysis, which are subjectively named. The benefit of
studying sets of variable relations, however, is in the concept of dimensionality and understanding multiple
variable relations in other statistical methods (factor analysis, principal components, and structural equation
modeling).

There are several issues that affect using canonical correlation. These issues are related to statistical analysis of
data in general, and also affect canonical correlation analysis. The issues are as follows:

Multicollinearity
Outliers
Missing data
Sample size

204
If the set of Y variables were highly interrelated (multicollinear), then they would be redundant, implying that
only a single Y variable is needed. Similarly, if the set of X variables were highly interrelated (multicollinear),
then only a single X variable would be required for the analysis. If we conceptualized this in a factor analysis
framework, then each set of variables would be considered unidimensional and would define a single
construct. Essentially, we would be correlating factor scores (Hair, Black, Babin, & Anderson, 2010). Outliers
and missing data dramatically affect the correlation coefficient, sometimes reducing the correlation value and
other times changing the sign of the correlation coefficient (Schumacker, 2014). Sample size and, thus, power
are also issues found in multivariate statistics. Research has indicated that a sample size of 20 subjects per
variable can provide adequate results and power in multivariate statistical methods, although it is strongly
suggested that larger sample sizes are more desirable (Costello & Osborne, 2005; Meyers, Gamst, & Guarino,
2013).

205
Assumptions
The assumptions one should meet when using canonical correlation are similar to the other multivariate
statistics. They are related specifically to the correlation among variables in each set, as well as the correlation
between variables in both sets. The key assumptions to consider are as follows:

Normality (univariate and multivariate within each Y and X set of variables)


Linearity (nonlinearity would affect correlations)
Equal variance (affects correlation among pairs of variables)

Canonical correlation is sensitive to nonnormality because it is computed using correlation or variance–


covariance matrices that reflect linear relations. Skewed data can be corrected using data transformations (log,
probit, etc.); however, kurtotic data are troublesome to correct. We have learned that a nonlinear data relation
yields a zero correlation, which signifies no linear relation. Of course, a nonlinear relation can exist in pairs of
variables, but not using the Pearson correlation coefficient. Canonical correlation maximizes the linear
correlation between two sets of variables; thus, it does not reflect any nonlinear variable relations in the data.
The assumption of equal variance permeates all of statistics, whether making mean comparisons or when
correlating data. The variance of the data should be the same around the mean (centroid) for a valid and
unbiased statistical test.

206
R Packages
The cancor() function in the stats package provides the basic canonical correlations and coefficients for the
canonical variates. However, the CCA package provides varimax rotation, graphics, and F tests of the
canonical variates. The data are analyzed using the cancor() function in the R stats package, followed by
additional output from the CCA package. The data set LifeCyclesSavings uses data to examine the life cycle
savings ratio (personal savings divided by disposable income) from 1960 to 1970 in different countries
(Belsley, Kuh, & Welsch, 1980). The data set has 50 observations and 5 variables: sr = aggregate personal
savings; pop15 = % population under 15; pop75 = % population over 75; dpi = disposable income; and ddpi =
% growth rate of dpi. The data source references and descriptions are obtained in R by issuing the following
commands:

To run canonical correlation analysis, we first find out more about the cancor( ) function and arguments by
issuing the R command:

We also should look at the data. We can access the data and print out a few data lines with the following R
commands:

We are now ready to run the canonical correlation analysis using the cancor() function. It does require the
creation of two separate matrices, one for Y variables and one for X variables.

207
There are three Y variables (sr, dpi, ddpi) and two X variables (pop15, pop75). The first canonical correlation,
r = .82, is for the first canonical variate; the second canonical correlation, r = .36, is for the second canonical
variate. These two separate canonical variates indicate different dimensions. The coefficients for the linear set
of X and Y variables are given in the $xcoef and $ycoef matrices, respectively. The first canonical variate
function is expressed as .008(sr) + .0001(dpi) + .004(ddpi) = − .009(pop15) + .048(pop75). The second
canonical variate function is expressed as 3.33(sr) − 7.58(dpi) − 1.22(ddpi) = − .03(pop15) − .26(pop75). The
$xcenter values (35.09, 2.29) would be used if centering was conducted prior to analysis. Similarly, the
$ycenter values (9.67, 1106.76, 3.76) would be used if centering was conducted prior to analysis. Centering is
generally done in the presence of an interaction effect because the parameters of the model (intercept and
regression weights of predictor variables) differ with the level of the moderator variable (Aiken & West, 1991;
Meyers et al., 2013). The canonical correlation equation did not contain an interaction effect, so I did not use
the argument to specify centering.

208
CCA Package
We now turn our attention to another canonical correlation package that provides additional results. We
locate related help pages using the R command:

We need to install and load the CCA package with the following R commands:

The canonical correlation matrices can be output using the following matcor() function.

The cc() function outputs the canonical correlations, names of variables, canonical variate coefficients, X and Y
canonical scores for each canonical variate function (linear equation), and the correlation of scores for Rxx,
Ryx, Rxy, and Ryy matrices. The canonical scores produced by the variable weights on a canonical variate are
sometimes used in other statistical analyses.

209
210
211
212
A visual plot of the dimensions produced by the canonical variates can be obtained using the following R
command:

The four quadrants show a grouping of the countries based on their life cycle savings ratio (personal savings
divided by disposable income) from 1960 to 1970. Japan has a higher ratio on the first dimension than
Ireland, so Japan is saving more than spending.

213
214
yacca Package
The F tests for the canonical variates require using the R yacca package and the F.test.cca() function with the
results from the cca() function. We would first install and load the package as follows:

Next, we would run the cca() function, and then the F.test.cca() function to compute the statistical tests of
significance.

The first canonical correlation, r = .82, is statistically significant (F = 13.49, df = 6.90, p < .0001). The second
canonical correlation does not report an F test, which is not uncommon in canonical correlation analysis, since
the first canonical variate is usually the only one that is statistically significant.

215
Canonical Correlation Example
The UCLA Institute for Digital Research and Education provided the data for this example
(http://www.ats.ucla.edu/stat/r/dae/canonical.htm). The gender variable was dropped from the data set, and
other R packages and functions were used to conduct the canonical correlation analysis. The data set was
described as follows:

A researcher has collected data on three psychological variables, four academic variables (standardized test
scores), and gender for 600 college freshman students. She is interested in how the set of psychological
variables relates to the academic variables and gender. In particular, the researcher is interested in how many
dimensions (canonical variables) are necessary to understand the association between the two sets of variables.

The canonical correlation focused on the relation of psychological measures to academic achievement
measures. The psychological variables are locus_of_control, self_concept, and motivation. The academic variables
are standardized tests in reading (read), writing (write), math (math), and science (science). Additionally, the
variable female is a zero–one indicator variable with the one indicating a female student.

The following R packages were used:

The data set was acquired using the read.csv() function from their website as follows:

A list of the first 10 record lines is given by

They specified the psychological variables as X variables and the academic variables as Y variables. The gender
variable was not retained in the set of variables, which they used in their analysis.

The correlation matrices for Rxx, Ryy, and Rxy (Ryx) are computed using the matcor() function in the CCA
package.

216
The canonical correlation analysis is now run using the cc() function. The output listed only shows the
canonical correlations and the raw coefficients used to compute scores.

The three canonical variates reported indicated a decreasing level of correlation, which is expected. The first
canonical variate captures the most explained variance, canonical r = .446. The raw canonical coefficients are
interpreted similar to interpreting regression coefficients. For read, a one-unit increase in reading leads to a
.044 decrease in the first canonical variate, holding all other variables constant.

217
A plot of the canonical variates shows the relation among the two sets of variables. The set of psychological
variables are not as centralized, thus not indicating a unified dimension. In contrast, the set of academic
variables are centralized, thus showing a single dimension. Ideally, we would like each set of variables to be
centralized with a separate location on the spatial map.

The canonical loadings produce the plot above. The loadings can be computed on the canonical variates using
the comput() function. The loadings are correlations between the observed variables and the canonical
variates. The canonical variates would now indicate a latent variable or dimension similar to factor analysis or
cluster analysis.

218
The correlations between the observed variables and canonical variates are called canonical loadings. In general,
the number of dimensions (canonical variates) is equal to the number of variables in the smaller set of
variables; however, the number of significant dimensions is generally less.

We can test the statistical significance of the canonical variates using the F.test.cca() function in the yacca
package.

Next, we would run the cca() function and then the F.test.cca() function to compute the statistical tests of
significance.

There are three canonical dimensions with the first two being statistically significant (F = 12.77, df = 12,
1,569, p < .0001; F = 2.42, df = 6, 1,188, p = .02). The results are orthogonal and therefore additive. We
would add the two squared canonical correlations to obtain an overall effect size. For example, (.446)2 +
(.153)2 = (.199 + .024) = .22 or 22% explained variance. We can clearly see that the first canonical variate
(dimension) explains most of the variance—thus, most of the relation between the psychological and academic
sets of variables.

When the X and Y variables have very different standard deviations, then standardized coefficients should be
computed to permit an easier comparison among the variables. A researcher should report both the
unstandardized and standardized loadings when possible. The standardized canonical coefficients for each

219
canonical variate (CV) can be computed as follows:

The standardized canonical coefficients are interpreted similar to interpreting standardized regression
coefficients. For the variable read, a 1 standard deviation increase in reading leads to a 0.445 standard
deviation decrease in the score on the first canonical variate when the other variables are held constant.

220
Effect Size
The effect size in canonical correlation analysis is the squared canonical r value. It is also the eigenvalue of a

canonical variate, which is expressed as λi=ri2 . Therefore, the squared canonical correlation
coefficient for each canonical variate is an eigenvalue. Recall, eigenvalues indicate the amount of variance in a
matrix for each eigenvector.

The first canonical variate will explain the most variance—thus, the largest effect. In the UCLA example, the
first canonical variate yielded, canonical r = .446. The squared canonical correlation, r2 = .199 ~ .20, is
considered a medium effect size. Cohen (1988) indicated a general reference for effect sizes: .1 (small), .25
(medium), and .4 (large). However, it is best to know what the effect sizes are in your field of study before
drawing these conclusions.

221
Reporting and Interpreting
The purpose of canonical correlation analysis is to determine whether two sets of variables are related. The
two sets of variables can be related on more than one dimension, so any interpretation of results should
include a discussion of the dimensionality of the results. This can be accomplished by reporting the number of
significant canonical variates (dimensions), the statistical significance (F test), and the effect size(s). A general
reporting follows.

The canonical correlation analysis tested whether psychological variables were related to academic variables.
The psychological variables were locus of control, self-concept, and motivation. The academic variables were read,
write, math, and science test scores. The psychological variables indicated a weaker dimensional structure than
the academic variables. The results in Table 8.1 indicated two statistically significant canonical variates
(dimensions). Table 8.2 indicates the unstandardized and standardized coefficients for the two canonical
variates. In the first dimension, the psychological variables were influenced by locus of control and motivation.
For academic variables in the first dimension, reading and writing were most influential. The first squared
canonical correlation indicated r2 = .20, a medium effect size. The second squared canonical correlation, r2 =
.02, indicated a very small effect size. The second dimension, therefore, only added 2% to the explained
variance.

222
223
Summary
Canonical correlation analyzes the relation between two sets of variables. The set of dependent and
independent variables can be combined in different ways (dimensions). This results in many different canonical
variates, or linear combinations of the Y and X variables. The different linear combinations form pairs of
canonical variates. The goal is to determine how many significant canonical variate pairs are in the data set—
that is, how many dimensions. The canonical variates are computed in descending order of magnitude, so the
first canonical variate will explain the most variance (squared canonical correlation coefficient). The canonical
variate results are orthogonal, so each additional canonical variate solution is adding to the explained variance.

The squared canonical correlation coefficient is an effect size measure, which indicates the amount of variance
accounted for by the two linear sets of variables. We are reminded that the squared canonical correlation

coefficient for each canonical variate is an eigenvalue, λi=ri2 . Recall, eigenvalues indicate the
amount of variance in a matrix for each eigenvector. Therefore, the canonical variate weights are eigenvectors.
We can express the eigenvalues in terms of the matrix of eigenvectors and the correlation matrix as follows:
E=V′RV,

where E is the matrix of eigenvalues, V′ is the transposed matrix of eigenvector weights, V is the eigenvector
matrix (weights), and R is the correlation matrix.

224
Exercises
1. List the basic assumptions one should meet to run a canonical correlation analysis.
2. Explain the difference between discriminant function analysis and canonical correlation.
3. Conduct a canonical correlation analysis using data from Tabachnick and Fidell (2007, p. 572).

Run the different R functions to report the four matrices, the canonical variates, F test of canonical variates, plot of the dimensions, the
standardized canonical loadings, and effect sizes. Use the type = “i” argument in the plot function. Interpret the results.

Eight belly dancers were measured on two sets of variables. The X variables measured top shimmy (TS) and top circles (TC). The Y
variables measured bottom shimmy (BS) and bottom circles (BC). The canonical correlation analysis was conducted to determine if there
is a statistically significant relation between the movement on the top and the movement on the bottom of a belly dancer.

225
Web Resources
A recommended R tutorial on using data sets can be found at

http://ww2.coastal.edu/kingw/statistics/R-tutorials/dataframes.html

The description of the CCA package can be found at

http://cran.r-project.org/web/packages/CCA/index.html

The list of R data sets can be found at

http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html

The R software can be found at

http://www.r-project.org/

The UCLA canonical correlation example can be found at

http://www.ats.ucla.edu/stat/r/dae/canonical.htm

226
References
Afifi, A., Clark, V., & May, S. (2004). Computer-aided multivariate analysis (4th ed.). Boca Raton, FL:
Chapman & Hall/CRC Press.

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury
Park, CA: Sage.

Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics. New York, NY: Wiley.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence
Erlbaum.

Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four recommendations
for getting the most from your analysis. Practical Assessment, Research & Evaluation, 10(7), 1–9.

Hair, J. F., Jr., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate data analysis (7th ed.).
Upper Saddle River, NJ: Prentice Hall.

Hotelling, H. (1936). Relations between two sets of variables. Biometrika, 28, 321–377.

Meyers, L. S., & Gamst, G., & Guarino, A. J. (2013). Applied multivariate research: Design and
interpretation (2nd ed.). Thousand Oaks, CA: Sage.

Schumacker, R. E. (2014). Learning statistics using R. Thousand Oaks, CA: Sage.

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston, MA: Allyn &
Bacon, Pearson Education.

227
9 Exploratory Factor Analysis

Overview
Types of Factor Analysis
Assumptions
Factor Analysis Versus Principal Components Analysis
EFA Example
R Packages
Data Set Input
Sample Size Adequacy
Number of Factors and Factor Loadings
Commonality
Scree Plot
Factor Rotation and Extraction: Orthogonal Versus Oblique Factors
Factor Scores
Graphical Display
Reporting and Interpreting
Summary
Exercises
Web Resources
References
Appendix: Attitudes Toward Educational Research Scale

228
http://www.education.com/reference/article/spearman-charles-edward-1863-1945/

Charles Edward Spearman (September 10, 1863, to September 17, 1945) was born in London, England, and died at the age of 82.
Spearman began his interest in psychology at the age of 34 when he began his studies at the Leipzig laboratory of Wilhelm Wundt
(the founding father of psychology). Spearman was greatly influenced by the work of Sir Frances Galton who researched hereditary
genius and human faculty development. This launched his interest into how individual differences in sensory, motor, and cognitive
abilities could be measured using standardized techniques. During his time, Spearman was also influenced by the biological heredity
component of genius and how correlations of various variables could indicate cognitive ability. He was no doubt aware of the work
by Karl Pearson in developing the correlation coefficient, hereditary research, and eugenics. Spearman developed factor analysis by
analyzing the correlations among variables to measure general intelligence, which supported Galton’s theory of general ability
(Spearman, 1904). He completed his doctoral studies in 1906 at the age of 42. Spearman became a full professor in 1911 in the
Department of Experimental Psychology at University College in London. He received numerous honors, including Fellow of the
Royal Society and membership in the U.S. National Academy of Science. Spearman retired in 1931, but he taught at Columbia
University where both his students, David Wechsler (Wechsler Adult Intelligence Scale—WAIS; Wechsler Intelligence Scale for
Children—WISC) and Raymond B. Cattell (16 Personality Factor Questionnaire), went on to become well-known for their

229
advances in research and psychological instruments. Spearman is known for developing factor analysis (Spearman g factor), creating
the Spearman rank correlation coefficient, and is considered the founding father of classical test theory.

230
Overview
Factor analysis created by Charles Spearman was based on using the Pearson correlations among pairs of
variables. Factor analysis uses the correlation among a set of observed variables to determine whether they
share common variance on a factor. Initially, every variable defines its own factor; however, through factor
analysis, a set of variables are combined on one or more factors. Factor analysis therefore presumes that some
factors, which are smaller in number than the number of observed variables, explain the shared variance
among the set of observed variables. The challenge in factor analysis is to determine the smaller subset of
variables and name their factors.

We have learned how missing data, outliers, nonnormality, and nonlinearity affect the value of the correlation
coefficient (Schumacker, 2014). Factor analysis therefore requires that these issues have been resolved in the
data screening process, otherwise the Pearson correlations are misspecified. Factor analysis is most useful
when a set of variables are significantly intercorrelated and represent a single theoretical construct—a term
called unidimensional—for example, a set of questions that measures attitude toward science, or attitude
toward statistics, or attitude toward research. If any of the questions were not significantly correlated, they
would be dropped from further consideration in the factor analysis.

231
Types of Factor Analysis
Factor analysis today has been characterized as exploratory factor analysis (EFA) and confirmatory factor analysis
(Thompson, 2004). This chapter explains how to conduct EFA. (The confirmatory factor analysis approach
will be covered in the chapter on structural equation modeling.) EFA provides the basis to group questions or
items together, thus having them represent one or more factors. A factor is indicated by the common shared
variance of a set of variables. The factor is given a name, which represents a construct. For example, a set of
math items share common variance on a single factor, and the test is given the name Mathematics Achievement
Test.

Factor analysis has changed over the years to include other methods and techniques. For example, ordinal
factor analysis is available when items come from a Likert-type scale (SA, A, N, D, SD) on a questionnaire
(Jöreskog, 1969). There are also different types of factor analysis methods (Cattell, 1952, 1965, 1966a, 1966b;
Gorsuch, 1983; Kaiser, 1960) that not only explore variables and items but also include people (individuals)
and time (repeated measurements). To explore all of these types and the many options available requires a
textbook devoted to the expansive topic and a semester long course of study. I therefore address the alpha
factor analysis method based on the correlation among items and present some of the options and choices a
researcher would make.

232
Assumptions
Factor analysis uses a correlation matrix; therefore, data screening must be conducted to determine if any
correlations are affected prior to factor analysis. The fundamental concerns affecting the value of a Pearson
correlation coefficient are as follows:

Sample size
Missing data
Normality
Linearity
Outliers

Kaiser (1970, 1974) provides a test of sampling adequacy, the ratio of the sum of squared correlations to the
total of sum of squared correlations, and a sum of squared partial correlations. When partial correlations are
small, the Kaiser–Meyer–Olkin (KMO) test of sampling adequacy will be close to 1.0. A value close to 1.0
indicates that the sample size is adequate.

The Bartlett (1954) test of sphericity, although sensitive to sample size, indicates whether the correlations in
the matrix are statistically significant overall. A good rule of thumb is that bivariate correlations should be .3
or higher (Tabachnick & Fidell, 2007).

Missing data reduce and change the Pearson correlation. If more than 10% of the data are missing,
imputation methods should be employed. Some basic imputation methods to handle missing data are mean
substitution, regression, expected maximum likelihood, and response pattern matching (Enders, 2010). When
large samples are available, it may be prudent to omit responses, but not using listwise (all variables will be
deleted if one has missing values) and pairwise (only variable pairs with missing data are reduced) deletions.
Pairwise deletion can cause different sample sizes for the bivariate correlations. Overall, a researcher should
estimate parameters with and without missing data to assess what impact the missing data structure had on
the analysis.

Normality is not a necessary but a sufficient condition in factor analysis. Multivariate statistics work best when
variables are normally distributed. When variables display opposite skewness, the Pearson correlation reduces
to r = 0. The issue is how skewness and kurtosis affect the Pearson correlation values. It is best to conduct data
transformations in the presence of skewness. Kurtosis is more difficult to change, requiring better
measurement of the variable.

The Pearson correlation is a measure of linear association; therefore, when data are not linear, the coefficient
is misspecified. A scatter plot can provide a visual check of linearity in the presence of a suspect Pearson
correlation. The scatter plot will also show any outliers or extreme values. A single extreme value can
dramatically affect the mean, standard deviation, and correlation of variables. A researcher should take the
time to know his or her data by properly screening for these issues that affect multivariate statistics, including
factor analysis.

233
The five basic assumptions beyond the data screening issues are as follows:

No multicollinearity (singularity)
No nonpositive definite matrix
Matrix has a positive determinant
Adequate sample size
Reliability

Multicollinearity occurs when variable correlations are too high—that is, they are closer to 1.0. In factor
analysis, the presence of multicollinearity results in a determinant and eigenvalue close to 0. The Pearson
correlation is a measure of linear relations between pairs of variables. If linearity holds, then r = 0 indicates no
relation (independent), while r = 1 indicates perfect relation (collinear). Singularity occurs in the presence of
high bivariate correlations; so basically only one of the variables in the pair should be used, the other should be
dropped from the analysis. The presence of high bivariate correlation is referred to as linear dependency.

A nonpositive definite matrix occurs when the determinant of the matrix is negative, thus no solution is
possible. The determinant of the matrix and the associated eigenvalues need to be positive to extract variance.
Recall that the determinant of a correlation matrix is a measure of the generalized variance. Since negative
variances are not permitted mathematically, computations are not possible when a correlation matrix fails to
have a positive determinant. In factor analysis, the factors define the spatial relations and eigenvalues establish
the length of the axes. Correlation is expressed as the cosine of the angle between two axes: cosine (θ) = p =
correlation (X, Y). A perpendicular, or 90ΰ angle, represents zero correlation. The mathematical
representation may not be something you know, but it is related to the Pythagorean theorem (c2 = a2 + b2), so
variance (X + Y) = variance (X) + variance (Y). When expanded to include covariance (variable association), it
becomes variance (X + Y) = variance (X) + variance (Y) + 2 covariance (X, Y). In the law of cosines, this is
represented as c2 = a2 + b2 − 2ab cosine (θ).

A nonpositive matrix can occur when you have more variables than the sample size. It also occurs more often
when you have high intercorrelation (multicollinearity, singularity). Sometimes, when software adjusts for
unreliability in the measured variables (correction for attenuation), the correlations can become greater than
1.0, which is inadmissible for the calculation of a solution. Wothke (1993) provided an in-depth discussion of
nonpositive matrices in structural equation modeling, which also includes factor analysis and any other
correlation based on multivariate statistics (canonical correlation).

234
The statistical significance of the correlations among the variables also indicates reliability (Cronbach’s alpha).
Reliability is important because it affects validity coefficients—that is, factor loadings—which are computed
in factor analysis. The basic formula indicating the relation between a validity coefficient and the reliability of
the two variables is as follows:
rxy*=rxyrxyryy,

where the validity coefficient (rxy; factor loading) is limited by the reliability of the two variables (rxx and ryy).
When considering the reliability of the measures, the correlation can sometimes become greater than 1.0, thus
leading to a nonpositive definite matrix.

235
Factor Analysis Versus Principal Components Analysis
Factor analysis is often confused with principal components analysis (PCA); in fact, many examples that you
may come across incorrectly show principal components as factor analysis. I have found myself on more than
one dissertation committee trying to clarify the difference. You can imagine the conversation that follows
when I attempt to explain the difference. I am therefore providing Figures 9.1 and 9.2 to illuminate the
distinction.

Figures 9.1 and 9.2 represent the difference between EFA and PCA. Factor analysis has the arrows pointing
to the observed variables, while PCA has the arrows pointing from the observed variables to the components.
The diagrams reflect the model differences although the data are organized in the same way with rows being
observations and columns being the variables. I have diagrammed the model structures with circles
representing factors (latent variables or constructs) and squares indicating the observed variables, which are
used in path analysis and structural equation modeling to diagram models.

Figure 9.1 Factor Analysis Model Structure

Figure 9.2 Principal Components Model Structure

The structures of the two models in Figures 9.1 and 9.2 show how the observed variables (X1 to X4) are
related to the underlying latent variables, which are factors (F1 and F2) in factor analysis and components (C1

236
and C2) in PCA. The direction of the arrows in the two figures displays the very important difference in how
the observed variables are used in EFA and PCA. In factor analysis, the observed variables are expressed as a
linear combination of the factors. In PCA, the components (C1 and C2) are linear combinations of the
observed variables (thought of as weighted sums of the observed variables). Another difference is that in factor
analysis, we seek to account for the covariances or correlations among the observed variables, while in PCA,
we seek to explain a large part of the total variance in the observed variables by the number of components.
Also, factor scores are computed using the factor loadings, while principal component scores are computed
using the observed variable weights.

Another way of distinguishing EFA and PCA is in the diagonal of the correlation matrix—that is, the
variable variance. In PCA, the diagonal values = 1.0 in the correlation matrix. Also in PCA, the sum of the
diagonal values equals the number of variables—that is, the maximum variance to be accounted for including
unique (residual) variance. If all PCA components are used, then PCA duplicates the original correlation
matrix and the standard scores of the variables. In EFA, the diagonal values = SMC (squared multiple
correlation, R2), which reflects only the variance of each variable with the other observed variables. This
excludes the unique (residual) variance. The shared variance in the diagonal of the matrix is the commonality
estimate. The sum of the communalities is less than the total variance in the set of variables because the
unique (residual) variance is not included. Therefore, the combination of factors only approximates the
original correlation matrix (variance–covariance matrix) and scores on the observed variables. Conceptually,
EFA analyzes covariance (commonality among variables), while PCA analyzes variable variance. EFA
attempts to reproduce the correlation matrix (variance–covariance matrix) with a few orthogonal factors. PCA
extracts the maximum variable variance from a few orthogonal components.

237
EFA Example

238
R Packages
EFA in R requires the use of several packages. They are installed, loaded, and described as follows:

239
Data Set Input
The EFA is conducted using a 30-item instrument that measured attitudes toward research (Papanastasiou &
Schumacker, 2014; Appendix). We first read the comma-separated file, attr30.csv, from a file directory that is
located by the setwd() function. The argument selected the root directory, but this could also be the path
directory to a folder where the file is located. The read.table() function is used to read the comma-separated
file (*.csv), which contains the variable names on the first line (Q1 to Q30). The header = TRUE argument
expression permits reading in the variable names in the data set.

You can view the first few lines of the data file and the last few lines of the data file with these two R
commands:

Note: An alternative is to use the > file.choose() command, which opens the directory on the computer to
search and select the file.

The data file, factdat, contains 30 questions and n = 541 responses. You do not need to use the raw data file,
especially if it is extremely large, so another option is to create a correlation matrix as follows:

Note: p Values for correlations in the matrix can be obtained using the corr.p() function. The R command
using the print() function for output is as follows:

240
241
Sample Size Adequacy
The rela package permits a determination of correlation matrix sphericity, sample size adequacy, and internal
consistency of item response (Cronbach’s alpha). The paf() function computes these tests. In addition, it is
important to find out if the determinant of the correlation matrix is positive. We would use the following R
commands to do so:

KMO is .948, which is close to 1.0; thus indicating that the sample size is adequate for factor analysis. The
Bartlett test indicates whether the correlation matrix is an identity matrix—that is, whether all diagonal
elements are 1.0 and all off-diagonal elements are 0.0, which implies that all the variables are uncorrelated.
We desire a statistically significant Bartlett chi-square, which indicates that statistically significant correlations
exist to proceed with factor analysis. The Bartlett chi-square = 10,397, but we do not know if it is statistically
significant. We need to run a test of significance using the cortest.bartlett() function in the psych package.

There are other functions that perform similar tests:

or, using Fisher z-score equivalents:

or, a function that compares two correlation matrices (not used in this example):

The KMO test indicated adequate sample size to proceed with the EFA. The Bartlett chi-square test was
statistically significant; therefore, we reject the null hypothesis that R = I (correlation matrix = identity

242
matrix), so the correlation matrix contains statistically significant correlations. The determinant of the
correlation matrix was positive, so we can extract common shared variance. A preliminary check of the
sampling adequacy, lack of an identity matrix, and positive determinant should all be analyzed prior to
conducting factor analysis.

Note: The paf() function is used for principal axis factoring, so it is being used to obtain preliminary
information regarding the dimensionality and scale functioning of the items.

The internal consistency reliability, Cronbach’s alpha, is obtained using the itemanal() function in the rela
package. It provides item means and standard deviations, skewness and kurtosis, covariance and correlation
matrices, bootstrap simulations, and item-total correlations. The output only shows the reliability coefficient
and confidence interval. The Cronbach’s α = .57 is low. A single item can reduce the reliability coefficient, so
examination of the correlation matrix is warranted. If any item is negatively correlated with any other item, it
should be removed. The confidence interval (95%, two standard errors) ranges from .34 to .80. If you output
the correlation matrix (> corfact), it will show that Q30 is negatively correlated with a few of the other items.

If we remove Q30 and compute Cronbach’s alpha with 29 items, the coefficient increases from .57 to .64, and
the confidence interval becomes narrower.

243
Number of Factors and Factor Loadings
An EFA can be conducted using functions in the psych package. You may wish to read more about the
contents and features in the psych package. There are several possible approaches for factor analysis using the
fa() function: fa.poly() for factoring categorical items that uses tetrachoric or polychoric correlations (useful
with Likert-type scaled questionnaire items), factor.minres() that uses least squares estimation to minimize
the residual values, factor.pa() for principal axis factoring using the least squares estimation method,
factor.wls() using weighted least squares estimation, and principal() for principal components analysis. Use
the following R command to obtain the PDF document that further explains the options available.

The arguments that will need to be specified in the fa() function can be examined by the following:

Note: These commands will automatically open your browser (usually Firefox) and display the information.

The fa() function includes the wls option, which uses the weighted least squares estimation method iteratively
to obtain factor loadings. It minimizes the squared residuals, and the weights are based on the independent
contribution of each item (question). The nfactors = 2 argument specifies that only two factors should be used
to account for variance among the variables: one factor for common variance and the other factor for unique
variance. The R commands are as follows:

244
The two-factor structure was tenable (χ2 = 2210.4, p < .0001, Fit = .98). The first factor (WLS1) explained
74% common variance, and the second residual factor (WLS2) indicated 26% unexplained variance. We desire
the factor loadings in the first column (WLS1) to be larger than the factor loadings in the second column
(WLS2). Except for Q30 (boldfaced), all questions have a higher factor loading in the first column. The h2
column indicates that the common or shared variance contributed to the factor structure, while the u2 column
indicates the unique or residual variance. We desire that h2 values, called commonality estimates, be larger than
u2 values, called residual estimates. For example, Q1 has h2 = .60 or 60% common variance and u2 = .40 or
40% unexplained variance. There are several questions, however, with residual variance larger than common
variance (13 boldfaced items in u2 column).

Commonality
The two-factor solution (1 common factor and 1 unique factor) provided a WLS1 common factor explained
variance of 74%. The h2 values represent each variable’s common variance or commonality. The factor

245
explained that variance is the sum of the squared factor loadings (∑h2) or commonality estimates divided by
the number of questions (m = 30): Factor variance = ∑h2/m. The unique factor denotes the unexplained
variance of 26%, which is 1−Factor variance. A researcher could attempt to reduce the unexplained factor
variance by adding more questions, taking another sample of respondents, or further investigating any subject
response error and any systematic responses (circle all of one scale choice). The output also provided a test of
the hypothesis of whether the two factors sufficiently reproduced the correlations in the correlation matrix.
The chi-square was statistically significant. The fit index = .98 indicated that 98% of the variable covariances
(correlations) were reproduced by the two-factor solution (common and unique factor structure). We could
easily conclude that the 30-item questionnaire is unidimensional and continue with naming the common factor
Attitude Toward Research.

Scree Plot
However, since 13 questions had more residual than common variance, it is possible that another common
factor might be present. Basically, the 30 items may be represented by two dimensions or constructs (yet to be
named). We therefore need to produce a scree plot (Cattell, 1966c). The plot() function will graph the
eigenvalues from the two-factor solution (Figure 9.3). The type = “b” argument yields both a point and a line
in the graph. The other arguments provide labels for the main title and the Y and X axes, plus it scaled the X
axis for clarity of the scree plot.

Figure 9.3 Scree Plot

The fa.parallel() function in the psych package can also produce a scree plot of actual and simulated data based

246
on eigenvalues of the factor analysis (Figure 9.4). The R command is as follows:

Figure 9.4 Parallel Analysis Scree Plot

The second parallel scree plot more clearly shows the number of possible factors based on eigenvalues > 1.0
(solid line). The scree plot indicates that three factors may be present to explain the common variance among
the variables.

The factor analysis was rerun with three factors to see what happens with the variable covariances.

247
When examining the three-factor solution (two common factors and one unique factor), some of the items
defined a second factor (boldfaced). Q30 was the only item that had a higher factor loading on the second
common factor. The other boldfaced factor loadings were lower but positive (8 questions). Ideally, if we had
all negative factor loadings on the second factor, then we are certain they are not represented by the other
factor. Sixteen questions had negative factor loadings on WLS2, and 5 had factor loadings less than .30 (a
recommended subjective cutoff point). So 21 out of 30 questions are clearly indicated on the WLS1 factor.
There are 8 questions with positive factor loadings on the second common factor, WLS2, but they are lower
than those on the first common factor (WLS1), which indicates that they have a higher common shared
variance with WLS1 than with WLS2. What we should notice is the substantial reduction in the residual
variance factor (WLS3) and the increased h2 values for many of the questions.

The proportion of explained factor variance is now distributed as WLS1 (67%) and WLS2 (24%). The
unexplained factor variance is WLS3 (9%). This shows that the unexplained factor variance has been reduced
from 26% (two-factor solution) to 9% (three-factor solution). The chi-square statistic was statistically
significant, thus supporting the three-factor solution. The fit index = .99, so 99% of the covariance

248
(correlations) in the matrix were reproduced. The 1% increase is not considered a substantial improvement in
the factor structure.

The two-factor solution seems more reasonable and supported rather than the three-factor solution. We
should further examine Q30, especially since it was negatively correlated with other items (affected the
Cronbach’s alpha) and because it has face validity, so changing the wording may help better relate the item to
the construct Attitude Toward Research. The single common factor (unidimensional construct) indicated 74%
explained variance. Additional use of the instrument and reporting of its factor structure will help sort out its
usefulness in research—basically future reporting of the validity and reliability of the factor scores.

Factor analysis does not provide names for the derived factors; this is done subjectively by the author. The
naming of factors is just one area of concern for scholars. Factor analysis does not provide a final definitive
solution—that is, there are many possible solutions. The two-factor and three-factor solutions could be
extended up to a 30-factor solution (one for each question). A simple example may help further our
understanding of this issue called indeterminacy. If X = T + E, and we are given X = 10, then what are the
possible values for T and E? More than one solution is possible: T = 5 and E = 5, T = 7 and E = 3, and so
forth. The notion of an indeterminate method is different from using different estimation methods. Estimation
methods do not provide an exact solution, rather they provide approximations of parameter estimates. So the
different estimation methods provide various methods for obtaining estimates of factor loadings. In some
cases, no solution exists due to problems in the data (nonpositive definite matrix). Steiger and Schönemann
(1978, chap. 5) provided a history of factor indeterminacy. Steiger (1996) further delineated the historical–
sociological issues in the discussion surrounding the history of factor indeterminacy, which was centered in
part by Edwin Bidwell Wilson who attempted to explain to Spearman while dining at Harvard that his two-
factor g theory does not have a unique solution—that is, it has an indeterminate number of possible solutions.

249
Factor Rotation and Extraction: Orthogonal Versus Oblique Factors
The two-factor solution assumed that the factors were orthogonal (uncorrelated) when analyzing the factor
structure of the 30-item Attitude Toward Research instrument. Orthogonal factors assume that variable factor
loadings are unique to each factor (Jennrich, 2001). Oblique factors assume that variable factor loadings are
shared between two or more factors (Jennrich, 2002). The first diagram shows orthogonal factors where X1
and X2 define the first factor and X3 and X4 define the second factor. The second diagram shows oblique
factors where X1 to X4 have factor loadings on both factors, thus sharing common variance, and the factors
are correlated.

Given the apparent positive factor loadings of some items on the second factor in the three-factor solution,
the two common factors may be considered oblique factors. For example, Q6 in the three-factor solution had
the same factor loading (.56). The varimax option (orthogonal rotation) and oblimin option (oblique rotation)
were both run to further investigate the factor structure of the 30 questions.

250
The orthogonal varimax rotated solution indicated the same items on the second factor as the no rotate default
solution; however, the factor variance was very different. The no rotate solution gave WLS1 (67%) and WLS2
(24%) with residual factor variance of WLS3 (9%) compared with the varimax solution, which gave WLS1
(42%) and WLS2 (32%) with residual factor variance of WLS3 (26%). It would be difficult to argue that the
orthogonal varimax rotation was beneficial to understanding the factor structure of the items.

The oblique oblimin rotated solution indicated that the same nine items (Q1, Q6, Q8, Q9, Q14, Q16, Q23,
Q26, and Q30) would fall on a second factor as the no rotate default solution and varimax solution; however,
the factor variance again was very different. The no rotate solution gave WLS1 (67%) and WLS2 (24%) with
residual factor variance of WLS3 (9%) compared with the oblimin solution, which gave WLS1 (40%) and

251
WLS2 (31%) with residual factor variance of WLS3 (29%).

The two common factors had a higher correlation with the residual factor (r = .53) than between themselves (r
= .23). It would be difficult to argue that the oblique oblimin rotation was beneficial to understanding the
factor structure of the items.

We could continue to run other possible scenarios, for example, ordinary least squares solution using
minimum residual, no rotation, and nfactors = 2. This factor solution indicated variables on a single factor or
unidimensional construct more so than the weighted least squares approach. Q30 was also indicated on the
first factor solution. This helps confirm my selection of a single factor or unidimensional construct. The 73%
factor variance was only slightly lower than the 74% factor variance achieved earlier.

252
We can check these results further by running an orthogonal varimax rotation using the minimum residual
approach with nfactor = 3. Results showed something different. The factor variance was not incremental: The
second factor (MR2) and the residual factor (MR3) had larger factor variances than the first common factor,
MR1 (26%).

We can also continue by running the oblique oblimin rotation using the minimum residual approach with
nfactor = 3. It also showed a different result: The residual factor explained more variance (40%). The first and
second factors had a correlation (r = .55) compared with the earlier findings (r = .23). This would imply that
the first and second factors were correlated; thus, it was not a unidimensional factor solution.

Ok, time to stop exploring. This could go on indefinitely! Or is that indeterminately?

253
Factor Scores
The development of factor scores is most prominently known by the names Bartlett, Anderson, and Rubin. A
brief biography of each reveals their achievements and contributions to the field of statistics. Their
contributions go well beyond the development of factor scores in factor analysis.

These three prominent mathematicians/statisticians have become known for their approach to computing
factor scores. The three basic methods to compute factor scores are called the regression method (Thurstone),
the Bartlett method, and the Anderson–Rubin method. The regression method computes factor scores that
have a mean of 0 and a variance equal to the SMC of item and factor (h2 − commonality). The scores may be
correlated even when factors are orthogonal. The Bartlett method computes factor scores that also have a
mean of 0 and a variance = SMC. However, the SS of the unique factor for the variables is minimized. The
Anderson–Rubin method computes factor scores that have a mean = 0 and a standard deviation = 1, a
modification of the Bartlett method, which ensures orthogonality of the factors and uncorrelated factor scores.
DiStefano, Zhu, and Mindrilă (2009) provided an understanding of how to use the factor scores.

The regression method is the most commonly used. The factor loadings are used as regression weights in a
linear equation. The raw data file, factdat, is required when computing individual predicted scores. In
addition, the scores = “Thurstone” argument selects the type of factor scores, which are z scores using the
regression method.

The individual factor scores on the common factor (WLS1) were computed using the factor loadings for the
30 items multiplied by the individual person’s response to the question. The abbreviated regression equation is
as follows:
Scorei=.57(Q1)+.55(Q2)+.74(Q3)...+.74(Q28)+.50(Q29)+.35(Q30)

We should notice that the scores are in standard score form (Thurstone z scores), which a hearty statistician
could easily decipher. However, the average untrained person would be quickly confused, which is why many
test publishers use scaled scores, for example, NCE (normal curve equivalent) scores. We can convert the z
scores to a scaled score that ranges from 0 to 100, which provides a meaningful understanding of the scores
(Schumacker, 2004, chap. 10). The formula uses the high and low factor scores:

254
We can find the high score of 3.3082 and the low score of − 3.0387 by the following R commands:

We can now use these high and low scores to obtain the mean and standard deviation that will be used to
calculate the scale scores. The mean of 47.877 and the standard deviation of 15.756 is computed as follows:

We would use the mean and standard deviation in an acceptable linear transformation to compute the scaled
scores, which will range from 0 to 100. The scaled scores would be computed as follows:

Check: We can check our formula to show that the range of scaled scores will fall between 0 and 100. The
calculations are given as follows:

The individual scaled scores can be computed with a formula expression, which is given as follows:

The mean and standard deviation of the scaled scores can be computed in the following way:

The mean is identical as that in our formula, and we should not be concerned that the standard deviation is not
exactly the same value. The interpretation of the scaled scores is now easier than discussing the positive and
negative z scores. Scaled scores ranging from 75 to 100 would imply a positive attitude toward research, while
scaled scores ranging from 0 to 25 would imply a negative attitude toward research. Scaled scores in the
middle quartiles could be interpreted as having a moderate attitude toward research.

The factor scores and scaled scores should have a similar distribution. They also appear normally distributed;
that is, the factor scores had a ±3 range of factor scores. A histogram, using the hist() function, can display the
distribution of scores.

255
256
http://www.york.ac.uk/depts/maths/histstat/people/bartlett.gif

Maurice Stevenson Bartlett (June 18, 1910, to January 8, 2002). In 1929, Bartlett received a scholarship to Queens College at
Cambridge, where he studied mathematics. In 1932, he coauthored a paper with his statistics professor, John Wishart (Wishart
distribution). He later joined Egon Pearson (Karl Pearson’s son) in the statistics department at the University College, London,
where he also worked with R. A. Fisher. In 1946, he was invited by Harold Hotelling to spend time at the University of North
Carolina, Chapel Hill. From 1947 to 1960, Bartlett was the chair of the statistics department at the University of Manchester. He
accepted a position to chair biomathematics at Oxford, and he held that position until his retirement in 1975. He wrote several
papers on topics ranging from tests of homogeneity of variance, effects of nonnormality, multiple regression, time series, stochastic
processes, and spatial processes.

257
http://statweb.stanford.edu/~ckirby/ted/

Theodore Wilbur Anderson (June 5, 1918, to present) was born in Minneapolis, Minnesota, and earned his AA degree in 1937 from
North Park College in Chicago, Illinois. In 1939, he received his BS in mathematics from Northwestern University in Evanston,
Illinois. He received his MA in 1942 and PhD in 1945, both while at Princeton. He worked with Sam Wilks and John Tukey
during his time at Princeton. From 1946 to 1967, Anderson worked at Columbia University, where he interacted with Wald,
Scheffe, and Levene (all known by their work on the significance of an explanatory variable, post hoc test, and test of homogeneity of
variance, respectively). In 1967, he moved to Stanford University, becoming an emeritus professor in 1988. He turned 90 in 2008
with a birthday party at Stanford. He wrote the popular An Introduction to Multivariate Statistics book in 1958. He is known for the
development of the Anderson–Darling test, Anderson–Rubin test, and the Anderson–Bahadur algorithm. During his career, he was
awarded the Guggenheim Fellowship (1946), served as the editor of Annals of Mathematical Statistics (1950–1952), was elected
president of the Institute of Mathematical Statistics (1962), was elected fellow of the American Academy of Arts and Sciences
(1974), and became a member of the Norwegian Academy of Science and Letters.

258
http://www.stat.purdue.edu/people/faculty/hrubin

Herman Rubin (October 27, 1926, to present) was born in Chicago, Illinois. He showed early signs of brilliance in mathematics
while in high school. He jointly attended high school and the University of Chicago. He received his high school diploma in 1943,
BS in 1944, MS in 1945, and in 1948, he earned his Ph.D. in mathematics at the young age of 21 (all from the University of
Chicago). Rubin had a keen interest in solving sets of simultaneous linear equations and eventually helped develop a limited
information maximum likelihood estimation algorithm. In 1949, Rubin joined the Department of Statistics at Stanford. His interest
varied from decision theory to Bayesian statistics. Dr. Rubin moved several times in his career from Stanford to the University of
Oregon, then to Michigan State University, and finally to Purdue, where he currently works in the Department of Statistics and
Mathematics at the age of 88.

259
260
Graphical Display
There are a few options for plotting the factor structure. I chose to use the fa.diagram() function to show the
item to factor relations and the plot() function to show the item clustering for both factors. You can have both
plots appear in the same window using the par() function. The R commands are as follows:

What is so remarkable about the fa.diagram() function is that after all the different estimation methods and
rotation methods, we can visually see what the factor structure looks like. You could easily run all the different
methods, output into different files, and then diagram all of them. Take a rather quick visual check on the
factor structures that appear in your data!

The diagram is for the two-factor solution, where Q30 was questionable because it loaded on a second factor
(Q30: Research is a complex subject). We find that Q6 is also questionable because it loaded equally on both
factors (Q6: Research scares me). Notice that the factor loadings ranged from .8 to .4 on the first factor (WLS1).
Sometimes, it is nice to visualize the factor structure. The plot() function displays how the items are cross
referenced on the two factors.

261
Reporting and Interpreting
There are many approaches to EFA, so any write-up should indicate as much information as possible about
the selections made for the analysis. A basic sample write-up for the Attitude Toward Research instrument
could be as follows:

An EFA was conducted on a 30-item instrument with 541 subjects designed to measure Attitude Toward
Research, which was considered a unidimensional trait. The Bartlett test indicated no correlation matrix
singularity; that is, the null hypothesis of an identity matrix was rejected. The KMO test was statistically
significant indicating an adequate sample size. The determinant of the correlation matrix was positive. The
EFA used the weighted least squares extraction method with no rotation. A scree plot indicated three possible
factors; however, the authors chose a single factor that had the most explained variance (74%) compared with
other solutions. Cronbach’s α = .57, which is considered a low score reliability, due in part to a single poorly
worded question. Factor scores were produced using the regression method, and scaled scores were computed
that ranged from 0 to 100 for ease of interpretation.

262
Summary
Factor analysis provides many different estimation methods, different rotation methods, and types of factor
scores. EFA is a true exploratory feature, where you seek to determine the subsets of variables and number of
factors. It is considered a data reduction method because you are attempting to find a few factors that explain
the variable relations (correlations). Researchers generally attempt to build unidimensional instruments, thus a
common factor and a unique factor. There are multidimensional instruments (GRE, MMPI, etc.) that were
constructed to have more than one factor. For many researchers, factor analysis is an artistic endeavor that
uses an exploratory approach to find structure and meaning in the data. The subjective nature of naming the
factor, and the many variations and choices, makes EFA a challenging multivariate statistical approach. Best
practices in EFA were summarized by Costello and Osborne (2005) to provide guidance on decisions related
to extraction, rotation, number of factors, and sample size.

263
Exercises
1. What are five basic assumptions a researcher should meet when conducting a factor analysis?
2. Briefly explain the difference between factor analysis and principal component analysis.
3. Explain the difference between the regression method, Bartlett, and Anderson-Rubin approaches to factor scores.
4. Conduct EFA on the following data set (Harman.8) in the psych package. You can obtain a list of data sets globally by > data() or
specifically for the psych package by the following:

Information about the Harman.8 data set is available from the following:

Report the following:

1. Scree plot with determination of number of factors.


2. EFA using fm = “minres” (ordinary least squares solution), and rotate = “none” arguments. Report EFA results for nfactors = 2 and
nfactors = 3.
3. Interpret an EFA factor structure.

264
Web Resources
EFA using R Tutorial

http://rtutorialseries.blogspot.com/2011/10/r-tutorial-series-exploratory-factor.html

Nonpositive definite matrix discussion

http://www2.gsu.edu/~mkteer/npdmatri.html

265
References
Cattell, R. B. (1952). Factor analysis. New York, NY: Wiley.

Cattell, R. B. (1965). A biometrics invited paper. Factor analysis: An introduction to essentials II. The role of
factor analysis in research. Biometrics, 21, 405–435.

Cattell. R. B. (1966a). Handbook of multivariate experimental psychology. Chicago, IL: Rand McNally.

Cattell, R. B. (1966b). The meaning and strategic use of factor analysis. In Handbook of multivariate
experimental psychology (pp. 174–243). Chicago, IL: Rand McNally.

Cattell, R. B. (1966c). The scree test for the number of factors. Multivariate Behavioral Research, 1(2),
245–276.

Costello, A. B., & Osborne, J. W. (2005). Best practices in exploratory factor analysis: Four recommendations
for getting the most from your analysis. Practical Assessment, Research & Evaluation, 10(7), 1–9.
Retrieved from http://pareonline.net/getvn.asp?v=10&n=7

DiStefano, C., Zhu, M., & Mindrilă, D. (2009). Understanding and using factor scores: Considerations for
the applied researcher. Practical Assessment, Research & Evaluation, 14(20), 1–11. Retrieved from
http://pareonline.net/getvn.asp?v=14&n=20">http://pareonline.net/getvn.asp?v=14&n=20

Enders, C. K. (2010). Applied missing data analysis. New York, NY: Guilford Press.

Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.

Jennrich, R. I. (2001). A simple general procedure for orthogonal rotation. Psychometrika, 66, 289–306.

Jennrich, R. I. (2002). A simple general method for oblique rotation. Psychometrika, 67, 7–19.

Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis.


Psychometrika, 34, 183–202.

Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and

266
Psychological Measurement, 20, 141–151.

Kaiser, H. F. (1970). A second generation little jiffy. Psychometrika, 35, 401–415.

Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39, 31–36.

Papanastasiou, E., & Schumacker, R. (2014). Rasch Rating Scale analysis of the attitudes toward research
scale. Journal of Applied Measurement, 15(2), 189–199.

Schumacker, R. E. (2004). Rasch measurement: The dichotomous model. In R. Smith & E. Smith (Eds.),
Introduction to Rasch measurement (pp. 226–257). Maple Grove, MN: JAM Press.

Schumacker, R. E. (2014). Learning statistics using R. Thousand Oaks, CA: Sage.

Spearman, C. (1904). “General intelligence,” objectively determined and measured. American Journal of
Psychology, 15, 201–293.

Steiger, J. H. (1996). Coming full circle in the history of factor indeterminacy. Multivariate Behavioral
Research, 31(4), 617–630.

Steiger, J. H., & Schönemann, P. H. (1978). A history of factor indeterminacy. In S. Shye (Ed.), Theory
construction and data analysis in the behavioral sciences (pp. 136–178). San Francisco, CA: Jossey-Bass.

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). New York, NY: Pearson
Education.

Thompson, B. (2004). Exploratory and confirmatory factor analysis: Understanding concepts and
applications. Washington, DC: American Psychological Association.

Wothke, W. (1993). Nonpositive definite matrices in structural modeling. In K. A. Bollen & J. S. Long
(Eds.), Testing structural equation models (pp. 256–293). Newbury Park, CA: Sage.

267
268
Appendix

269
Attitudes Toward Educational Research Scale

The following statements refer to some aspects of educational research. Please answer all the questions sincerely. DO NOT
DISCLOSE YOUR IDENTITY ANYWHERE.

Circle one of the numbers opposite each of the statements that follow.

By selecting number 1 you indicate that you strongly disagree.

By selecting number 7 you indicate that you strongly agree.

270
10 Principal Components Analysis

Overview
Assumptions
Bartlett Test (Sphericity)
KMO Test (Sampling Adequacy)
Determinant of Correlation Matrix
Basics of Principal Components Analysis
Principal Component Scores
Principal Component Example
R Packages
Data Set
Assumptions
Number of Components
Scree Plot
Reporting and Interpreting
Summary
Exercises
Web Resources
References

271
© National Portrait Gallery, London

Karl Pearson (March 27, 1857, to April 27, 1936) invented principal component analysis (PCA) in 1901 as part of the principal axis
theorem in mechanics. PCA was later independently developed and named by Harold Hotelling in 1930. Pearson was a very
influential biometrician of his time. He contributed to the discipline of mathematical statistics, biometrics, meteorology, hereditary
research, and eugenics. His major influence came from Sir Francis Galton (Charles Darwin’s cousin). Eugenics fell out of favor after
Hitler’s regime.

Pearson was privately educated at University College School. In 1876, he enrolled in King’s College, Cambridge, to study
mathematics. In 1879, he enrolled at the University of Heidelberg in Germany and studied physics and also spent time at the
University of Berlin. He returned to England in 1880 and held various professorships. In 1911, Pearson founded the statistics
department at University College, London. When Galton passed away in 1911, he left money to fund a chair in eugenics at the
University of London, which Pearson held. Pearson remained active in the Biometric and Galton laboratories until his retirement in
1933, refused knighthood in 1935, and worked until his death in 1936. He made substantial contributions to statistics, including
founding the Biometrika journal, the Annals of Eugenics (now Annals of Human Genetics) journal, the Pearson correlation coefficient,
the Pearson chi-square, and PCA.

272
273
Overview
Principal components analysis (PCA) takes a set of variables and reduces them to one or more components
that represents the variables variance. Each component would represent the set of variables or a subset of the
variables. For example, a set of questions on an Attitude Toward Science instrument might reduce say 20
questions into two principal components. One component might comprise a set of questions that represent a
desire to pursue a career in a science field, and the second might comprise the remaining questions that
represent a positive attitude toward science. We would consider this a two-dimensional construct, where the
two components are most likely correlated (oblique). PCA is a statistical technique that permits the reduction
of multiple variable relations (correlations/covariance) into a fewer set of components (dimensions). The
principal component scores are then used in other statistical techniques (t test, analysis of variance, multiple
regression, etc.).

Principal components that are derived based on a set of variables requires subjective naming, the same as in
factor analysis. The interpretation is helped by the degree a variable is associated with a principal component.
It is therefore in the purview of the researcher to provide a rationale and support for the naming convention.

PCA uses the correlation matrix (R matrix) or variance–covariance matrix (S matrix) to obtain weights for the
linear combination of observed variables. The weights may be derived for a single principal component or for
several principal components. The components therefore represent the original number of variables. The
principal components have the distinct property of not being correlated (orthogonal). The principal
components also are derived so that the first component explains the most variable variance, followed by
subsequent principal components with variable variance to a lesser degree. The goal is to have a few
components that account for most of the variable variance in the correlation or variance–covariance matrix.
Another unique feature of principal components is that using the linear weights of the components reproduces
the original correlation (variance–covariance) matrix. This demonstrates that PCA is dissecting the variable
variance mathematically, which is distinctly different from factor analysis where the factor loadings represent
the correlation of the variable with the factor—that is, commonality or common shared variance with a
construct.

274
Assumptions
Multivariate statistics requires data screening prior to running any statistical procedures. Although data
screening is time-consuming, it is a necessary prerequisite to avoid problems when analyzing data. Most
multivariate statistic textbooks cover this important topic—for example, Raykov and Marcoulides (2008)
discuss proof reading data for entry errors, checking descriptive statistics, examining frequency distributions of
variables, identifying whether outliers are present in the data, and checking variable distribution assumptions
(normality) and variable transformations in the presence of skewness and kurtosis. You would be wise to
spend the time getting to know your data before embarking on the use of multivariate statistics.

The three main assumptions a researcher should meet to conduct a PCA is related to sphericity, sampling
adequacy, and positive determinant of a correlation or variance–covariance matrix. In R, sphericity is tested
using the Bartlett chi-square test. Sample adequacy is tested using the KMO test, and the determinant of a
matrix is a built-in function.

275
Bartlett Test (Sphericity)
The sphericity assumption is tested using the Bartlett chi-square test. The test is to check whether an identity
matrix is present; thus, it tests whether sufficient correlation exists in the correlation matrix to proceed. If the
Bartlett chi-square test is statistically significant, we can proceed with the PCA. However, if the Bartlett chi-
square test is nonsignificant, stop! The off-diagonal values (correlations/covariance) are not statistically
significant. The issue of whether significant bivariate correlations exist can be inspected by computing the
correlation matrix and printing the p values for each correlation.

276
KMO Test (Sampling Adequacy)
The KMO test is a measure of sampling adequacy. It ranges from 0 to 1 with values closer to 1 indicating that
the sample is adequate for the analysis. There is no statistical test for KMO, so many researchers shy away
from reporting KMO. Many multivariate statistics books cite a 20:1 ratio—that is, 20 subjects per variable as
a minimum for determining adequate sample size. Today, we have large data sets available, so meeting this
minimum sample size is not usually a problem. Many of the statistics packages (SAS, SPSS, STATA) report
power and effect size, which is related to sample adequacy. However, in planning a study, there are tools
available for judging sample size, power, and effect size for various statistical tests (Faul, Erdfelder, Buchner,
& Lang, 2009; Faul, Erdfelder, Lang, & Buchner, 2007).

277
Determinant of Correlation Matrix
The determinant of a matrix must be positive to proceed with the PCA. Multivariate statistics textbooks
generally have an in-depth coverage of matrix algebra, including matrix operations (add, subtract, multiply,
and divide). They also include the basic calculation of a determinant of a matrix and eigenvalues. The
determinant basically indicates the freedom to vary, so a determinant equal to zero would indicate complete
predictability in a matrix—that is, linear dependency. Linear dependency is when one variable is a linear
combination of the other variables. A matrix with a determinant of zero is a singular matrix. For example, if
correlations were 1.0 in the matrix, perfect prediction, and linear dependency, the determinant would be zero.
The determinant is used in finding the inverse of a matrix, which is used to compute eigenvalues. If the
determinant is zero, then the matrix is singular, and no inverse matrix is possible. We should be able to
multiply a matrix by its inverse to get an identity matrix. If the determinant is zero, there are no eigenvalues
(generalized variance); thus, no solution is possible. Basically, principal component weights could not be
computed.

278
Basics of Principal Components Analysis
The basic PCA approach can be broken down into a few important steps. First, examine the correlation (R)
or variance–covariance (S) matrix. The matrix must be a square matrix—that is, it must have the same number
of rows and columns. The R matrix would have 1’s in the diagonal and correlations in the off diagonal. The S
matrix would have variance of each variable in the diagonal and covariances in the off diagonals. For example,
the S matrix in Raykov and Marcoulides (2008, p. 217) is entered as follows:

The S matrix represents the five personality measures for n = 144 sophomores. The variances are given as
44.23, 55.13, 61.21, 57.42, and 33.34. Recall from your basic statistics course that the square root of the
variance is the standard deviation, and each bivariate correlation is computed as the covariance divided by the
square root of the product of the individual variable variances (Schumacker, 2014). The conversion from a
covariance matrix to a correlation matrix is easily given using the following cov2cor() function, which is in the
R base stats package:

To test whether these bivariate correlations are statistically significant, use the following R command:

279
The determinant of each matrix is computed using the det() function. The determinant of each type of matrix
will be different. The variance– covariance matrix contains the original scale of each variable—that is, mean
and standard deviation. The correlation matrix is in standard score form; that is, the variables have a mean
equal to 0 and a standard deviation equal to 1. Recall, the diagonal values in the correlation matrix are all
equal to 1.0. The correlation matrix therefore places all variables on the same scale of measurement, while the
variance–covariance matrix retains the original variable scale. You will get different principal component
weights depending on which type of matrix you use (think of multiple regression with an intercept term and
multiple regression without an intercept term). Multiple regression without an intercept term places the
regression line through the origin of the Y and X axes (0,0), while multiple regression with an intercept term is
placed on the Y axis, denoting a starting point for the slope relationship (Schumacker, 2014). The
determinants of the two types of matrices are as follows:

Given the positive determinant, we know that the inverse of the matrix is possible to yield the eigenvectors
and associated eigenvalues for the principal components. These are obtained using the following R command:

There are five components with descending eigenvalues (166.02193, 45.32619, 35.88278, 28.57806, and

280
14.54104). The sum of these eigenvalues is equal to the sum of the variable variances in the variance–
covariance matrix (S). The sum of the variable variances in the S matrix is 290.35. The sum of the eigenvalues
for the five principal components is 290.35, which is referred to as the trace of a matrix. The sum of the
variable variances indicates the total amount of variance that is available to be partitioned across the five
principal components. If these are not equal, get in touch for a free cup of coffee!

We can check this solution by computing the identity matrix. The identity matrix is computed by multiplying
the matrix of eigenvectors by the transpose matrix of eigenvectors: I = V * V-1 (recall, when you multiply a
number by its reciprocal, you get 1.0—that is, 9 × [1/9] = 1). The R commands are given as follows:

The identity matrix has 1s in the diagonals and 0s in the off diagonal of the square matrix.

We can repeat these steps using the correlation matrix. The R command for the eigenvalues and
corresponding eigenvectors (component weights) is as follows:

The sum of the eigenvalues from a correlation matrix will equal the number of variables. So if we sum the
eigenvalues above, they will equal 5. The variables are in standard score form, with each variable having a
variance equal to 1, so it makes sense that a value of 5 represents the total amount of variance available to
partition across the principal components. You should also notice that once again the eigenvalues are given in
a descending order: The first principal component accounting for the most variable variance (2.88/5 = 58%),
with the remaining components in lesser and lesser amounts.

Recall from the previous chapter that the eigenvalues can also be computed as the product of the transposed
eigenvector matrix times the correlation matrix, then times the eigenvector matrix, which was given in the
matrix expression: E = V′RV. The R commands using these matrices are given as follows:

281
The identity matrix is computed using the eigenvectors in a matrix multiplied by the transpose of the
eigenvector matrix, same as before. We output the results in a file, extract only the eigenvectors, create a
second file that contains the transpose of the eigenvector matrix, and then multiply these two matrices. The R
commands are as follows:

282
Principal Component Scores
A few final points should be made about principal components. The eigenvectors for each component are the
weights used in a linear equation to compute a score. For example, using the 5 eigenvectors from the S matrix
with V1 to V5 representing the five personality variables, we would compute component scores (Y1 to Y5) as
follows:

The principal components are orthogonal, linearly uncorrelated, and the number of components will be less
than or equal to the number of variables.

Note: You would need the raw data file to compute the principal component scores. The sum of the
eigenvalues is equal to the sum of the variable variances. PCA therefore divides up variable variance into one
or more principal components. This is why principal components is considered a mathematical approach to
decomposing the variable variance.

Note: A list of statistical functions can be obtained from > library(help=“stats”), and matrix algebra commands
from http://www.statmethods.net/advstats/index.html

283
Principal Component Example

284
R Packages
Many of the same R packages used for factor analysis are used to compute principal components. Therefore,
you will need to install and load the following R packages:

285
Data Set
You will need to go to the following website and download the WinZip file that contains all the data sets
from Raykov and Marcoulides (2008). Their website is
http://www.psypress.com/books/details/9780805863758/. Once you have downloaded the zip file, you will
need to extract the data sets to a file folder on your computer directory.

The example uses the data set, chap7ex1.dat, which is a tab delimited data file. It requires one of the special
input formats. The ch7ex1.dat file is read from the root directory (C:/), so the R commands would be as
follows:

The data set contains five variables and n = 161 subjects.

Note: You can find information on reading in different types of data files by using the R command > ?
read.table

The correlation matrix is computed with the following R command:

286
The variance–covariance matrix is computed with the following R command:

We can convert the covariance matrix to a correlation matrix using the cov2cor() function. The R command is
as follows:

Note: You can check the statistical significance of the bivariate correlations using the psych package and the
corr.p() function. The R commands are as follows:

287
Assumptions
We are now ready to determine if we can proceed with a PCA. There are three assumptions we should always
check: (1) sphericity, (2) sample adequacy, and (3) positive determinant of the matrix. The Bartlett chi-square
tests whether the matrix displays sphericity—that is, an identity matrix. An identity matrix would have 1s on
the diagonal and 0s on the off diagonal; thus, no correlation exists. The Bartlett test needs to be statistically
significant to proceed—that is, sufficient correlation must exist in the matrix. The KMO test ranges from 0 to
1, with values closer to 1 indicating sample size adequacy. The determinant of the correlation matrix needs to
be positive, which indicates that we can extract variance.

The Bartlett and KMO tests are in the rela package and are computed using the paf() function.

The KMO test is close to 1 (KMO = .86), so we would conclude that n = 161 with 5 variables is an adequate
sample size. Recall, many multivariate statistics books cite a 20:1 rule of thumb (5 variables × 20 = 100
subjects). The reported Bartlett chi-square of 614.15 is not indicated with a p value; therefore, we must run
the following R command to determine statistical significance.

The Bartlett χ2 = 614.15, df = 10, p < .00001 (the scientific notation overrides the printing of decimal values

288
when extreme). Our final concern is the determinant of the matrix. The determinant is positive (.02). The R
command is as follows:

We have now satisfied our three assumptions for conducting a PCA.

289
Number of Components
The PCA is computed using the psych package and the principal() function (not the fa() function). The
default R command setting is given, which provides for one component and no component scores.

We can interpret this initial output as follows. The SS loadings is equal to 3.80, which is the eigenvalue for
the single principal component. The proportion variance equal to .76 is the average of the h2 values (∑h2/m).
The eigenvalue is the sum of the h2 values; therefore, ∑h2/m = 3.80/5 = .76! That leaves 24% unexplained
variance. This could be due to another principal component or residual error variance.

To view more eigenvalues that represent 100% of the variance in the correlation matrix, we can extend the
number of components and use the following R command:

The eigenvalues for the 5 principal components are given in descending order: PC1 = 3.8 (76%), PC2 = .43
(9%), PC3 = .40 (8%), PC4 = .24 (5%), and PC5 = .13 (3%). The sum of the eigenvalues explained variance
equals 100%. (Note: There is rounding error in the percents listed using two decimal places). The cumulative
variance, however, indicates the incremental explained variance from PC1 to PC5 that sums to 100%. The h2
(variable explained variance) is now 1.0, although u2 (residual variance) does indicate a very small amount of

290
residual error.

A check of the Cronbach’s alpha reliability coefficient indicates high internal consistency of response (α =
.92); so it does not affect the PCA results.

Scree Plot
The scree plot is a very useful tool when deciding how many principal components are required to explain the
variable correlation (covariance). The general rule is to select eigenvalues that are greater than 1.0. We already
have seen the eigenvalues for the five principal components. The first component has an eigenvalue of 3.8,
while all others were less than 1.0. We should see this when plotting the eigenvalues. The R command is as
follows:

Another type of scree plot is the parallel scree plot. The fa.parallel() function, however, now includes the
arguments fm=“pa” and fa=“pc” for a principal components, rather than a factor, analysis. The parallel scree
plot is given by the R command:

291
A plot of the five principal component model reveals graphically the component structure. The factor analysis
loadings show high validity coefficients (.8, .9). These would be used to compute factor scores and are scaled
accordingly, as presented in Chapter 9.

292
Results indicated that a single component will summarize the five variable relations and yield 76% of the
variable variance. The principal component equation to generate the scores is computed using the first set of
weights.
Yi=.87(IRLETT)+.86(FIGREL)+.92(IRSYMB)+.88(CULTFR)+.81(RAVEN)

We will need to first declare the PCA data set as a data frame. This is done so that the variable names can be
used in the formula. Next, we compute the principal component scores using the weights in a formula:

Once again, we find ourselves trying to make sense out of the scores. What does a 66.213 mean? We need to
create a scale score that ranges from 0 to 100 for a more meaningful interpretation. The formula was given in
the last chapter.

Once again, a graph of the principal component scores and the scaled scores show the equivalency. However,
the scaled scores provide us with a meaningful interpretation. The five mental ability variables were reduced to
a single component, which I will call Mental Ability. A person with a scaled score more than 50 would possess
above average mental ability, while a person with a scaled score less than 50 would possess a lower than
average mental abiliτy.

293
294
Reporting and Interpreting
Principal components analysis was conducted for 5 variables (IRLETT, FIGREL, IRSYMB, CULTFR, and
RAVEN), which had statistically significant bivαriate correlatιons. The assumπtions for sphericity, sample
adequacy, and determinant of the matrix were tested. The Bartlett chi-square test = 614.15 (p < .00001),
which was statistically significant indicating that sufficient correlations were present in the matrix for analysis.
The KMO test = .86. The KMO test is close to 1.0 (one), so we would conclude that n = 161 with 5 variables
is an adequate sample size for the analysis. Finally, the determinant of the correlation matrix = .02, which is
positive with the following eigenvalues in descending order for the 5 variables (3.80 0.43 0.40 0.24 0.13). The
eigenvalues should sum to 5, the number of variables. PCA indicated a single unidimensional component with
76% variance explained. The scree plot indicated a single eigenvalue greater than one. Table 10.1 indicates the
standardized loadings (PC1), the commonality estimates (h2), and the residual estimates (u2). The h2
(explained variance) plus u2 (unexplained variance) equals one for each variable. The internal consistency
reliability coefficient (Cronbach’s alpha) indicated consistency of scores, that is, score reproducibility (α = .92).
Principal component scores would be computed using the following equation with the component variable
weights:

Yi=.87(IRLETT)+.86(FIGREL)+.92(IRSYMB)+.88(CULTFR)+.81(RAVEN)

295
SUMMARY
PCA takes a set of variables and reduces them to one or more components that represents the sum of the
variables variance. The diagonal of a correlation matrix indicates the variables variance—that is, 1.0.
Therefore, the number of variables equals the amount of variance in the correlation matrix to be explained.
PCA is a statistical technique that permits the reduction of a number of variables in a correlation matrix into
fewer components (dimensions). Principal components that are derived based on a set of variables also require
subjective naming, the same as in factor analysis.

This chapter also points out the importance of the determinant of a matrix, whether correlation or variance–
covariance. It also shows the importance of the eigenvalues of a matrix. In PCA, the sum of the eigenvalues
from a correlation matrix will equal the number of variables.

The eigenvectors for each component are the weights used in a linear equation to compute a score, which can
be used in other statistical analyses. The principal component scores when rescaled from 0 to 100 provide a
more meaningful interpretation. The principal components are orthogonal, linearly uncorrelated, and the
number of components will be less than or equal to the number of variables.

296
Exercises
1. Describe principal components analysis in a few brief sentences.
2. Define the determinant of a matrix.
3. Define eigenvalues and eigenvectors.
4. Conduct a PCA using the following data set, attitude.txt, with five components.
a. Report determinant, Bartlett chi-square, and KMO.
b. How many principal components had eigenvalues >1.0?
c. Did the scree plot confirm this?
d. How much extracted variance was explained by the principal components?
e. What would you name the main components?
f. Does the eigenvector matrix times its transpose equal an identity matrix?

297
WEB RESOURCES
GPower software for determining sample size, power, and effect size for various statistical tests

http://www.gpower.hhu.de/en.html

Matrix algebra commands useful in R

http://www.statmethods.net/advstats/index.html

298
References
Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1:
Tests for correlation and regression analyses. Behavior Research Methods, 41, 1149–1160.

Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis
program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.

Raykov, T., & Marcoulides, G. A. (2008). An introduction to applied multivariate analysis. New York, NY:
Routledge (Taylor & Francis Group).

Schumacker, R. E. (2014). Learning statistics using R. Thousand Oaks, CA: Sage.

299
11 Multidimensional Scaling

Overview
Assumptions
Proximity
MDS Model
MDS Analysis
Sample Size
Variable Scaling
Number of Dimensions
R Packages
Goodness-of-Fit Index
MDS Metric Example
MDS Nonmetric Example
Reporting and Interpreting Results
Summary
Exercises
Web Resources
References

300
Green, B. (1999). Warren S. Torgerson. Psychometrika, 64(1), 3-4

Warren S. Torgerson (November 10, 1924, to February 1, 1999) held the professor emeritus of psychology at Johns Hopkins
University. In 1947, he received his bachelor’s degree from the California Institute of Technology. In 1951, Torgerson received his
PhD in psychology from Princeton University. In 1955, he joined the faculty at the Massachusetts Institute of Technology. In 1958,
he published his well-known Theory and Methods of Scaling. In 1964, he received the rank of professor of psychology at Johns
Hopkins University. He served as department chair until 1969. He taught quantitative psychology and history of psychology until his
retirement in 1997. In 1997, Dr. Torgerson received the lifetime contributions award from the American Psychological Association.
Torgerson was a former president of the Psychometric Society and a member of the Society of Experimental Psychologists, the
Society for Mathematical Psychology, and the American Association for the Advancement of Science.

301
http://www.people.fas.harvard.edu/~banaji/research/speaking/tributes/shepard.html

Roger N. Shepard (January 30, 1929, to present) is regarded as the father of research on spatial relations. He is the author of Toward
a Universal Law of Generalization for Psychological Science (Shepard, 1987). In 1955, Shepard received his PhD in psychology from
Yale University. His postdoctoral training was with George Miller at Harvard. In 1958, he worked at Bell Labs. In 1966, he was a
professor of psychology at Harvard. In 1968, Dr. Shepard joined the faculty at Stanford University. In 1995, he received the
National Medal of Science. In 2006, he received the Rumelhart Prize and is a professor emeritus of social science at Stanford
University. Shepard added to multidimensional scaling by developing the Shepard diagram. Shepard further developed the use of
ordinal proximities in nonmetric multidimensional scaling (Shepard, 1962, 1963).

302
http://three-mode.leidenuniv.nl/people/html/kruskal.htm

Joseph B. Kruskal (January 29, 1928, to September 19, 2010) was regarded as a mathematician, statistician, computer scientist, and
psychometrician. In 1954, he completed his PhD from Princeton University. From 1959 to 1993, Kruskal worked at Bell Labs. He
created the Kruskal’s algorithm for computing the minimal spanning tree of a weighted graph. His technique was used in the
construction and pricing of communication networks. He was a Fellow of the American Statistical Association, former president of
the Psychometric Society, and president of the Classification Society of North America. His brother, William Kruskal, developed
the Kruskal–Wallis one-way analysis of variance. Joseph B. Kruskal added to multidimensional scaling by developing the Kruskal
STRESS goodness-of-fit test and further developed nonmetric dimensional scaling (Kruskal, 1964a, 1964b).

303
Overview
Multidimensional scaling (MDS) was first introduced by Torgerson (1952). MDS helps visualize the level of
similarity or dissimilarity among objects (Borg & Groenen, 2005; Cox & Cox, 2001). The objects of interest
could be people, voting, events, colors, stocks, distance traveled between cities, and so on. The MDS
algorithm places the subject in N-dimensional space so that between-object distances are clarified into
groupings, where δij is the distance between the ith and jth objects. There are several variations referred to as
classical, metric, nonmetric, and generalized (Torgerson, 1958). Kruskal (1964a, 1964b) developed an index of
goodness of fit. In classical MDS, the Euclidean distance function is used rather than a metric or generalized
distance function (Kruskal & Wish, 1978; Mardia, 1978).

MDS is not used in some disciplines probably because it is not understood or is confused with EFA and
PCA. MDS is considered by some as a principal axis factoring method. EFA and PCA obtain underlying
dimensions from subjects’ responses to questions on a questionnaire. MDS obtains underlying dimensions
from subject’s judgments about the similarity of objects in paired comparisons. Meyers, Gamst, and Guarino
(2013) reported that the process of rating paired comparisons has a rich history with Gustov Fechner’s
method of constant stimuli and J. Cohn’s method of paired comparisons, which was used by Wilhelm Wundt
in his research on feelings and by Louis L. Thurstone in his research on comparative judgments in the 1900s.
MDS functions, however, can be a good complementary method after a profile analysis, finding two or more
factors in EFA, or components in PCA, since MDS can use correlation matrices as input data. MDS would
show the two sets of variables or items in different spatial dimensions. For example, Graduate Record Exam
(GRE) has three scores based on items that measure verbal, quantitative, and analytic knowledge. The GRE
is multidimensional, so each set of items or the three domains would reside in a different dimension. MDS
has the advantage of being an exploratory data analysis technique that can further explore psychological
domains without the numerous assumptions of other multivariate statistical tests (Jaworska & Chupetlovska-
Anastasova, 2009).

The input data for MDS analyses are called proximities, which indicate the similarity or dissimilarity among
the objects. A classic MDS approach would be to analyze a correlation matrix. High correlation values would
represent small distances and would display visually together on a spatial mapping. Questionnaire survey data
that use paired comparisons are also a typical type of data that MDS analyzes. For example, using a Likert-
type scale from similar to dissimilar, compare Coke and Pepsi. Next, compare Pepsi and Sprite, followed by
Coke and Sprite. If responses are as expected, we should find Coke and Pepsi similar and Sprite dissimilar. If
you add other soft drink products that are branded as cola products versus noncola products, the paired
comparisons should show up in MDS analysis. Another example might be distances between cities. If we
arrange data to indicate miles between cities, MDS analysis should group cities closer together from cities that
are farther away. In marketing research, preferences and perceptions of choices selected can be visually shown
on a map of the brand names, called perceptual mapping (Green, 1975). The soft drink and miles between
cities are two examples that would be analyzed in two-dimensional space. A researcher could conceivably
entertain a three-way dimension or more if applicable to the groupings, for example, chest pain diagnostics.

304
So in many ways, MDS analysis should reflect your understanding of what objects are similar and dissimilar
on the trait being compared.

The classical (metric) MDS based on Euclidean distances is determined by computing the distances between
pairs of objects as follows:
dij=(xi−xj)2+(yi−yj)2.

The data are discrete distances between pairs of objects on a map—for example, distances between cities. The
advantage of the classical MDS analysis is that the solution requires no iterative procedures. When the data
are proximities determined by subjects comparing pairs of objects, say on a survey questionnaire, nonmetric
MDS is utilized (Borg & Groenen, 2005).

305
Assumptions
MDS does not require any assumptions. However, there are a few decisions that a researcher must make that
affects MDS results (Wickelmaier, 2003). If the proximity matrix contains a common metric where distances
are used, then a metric analysis with Euclidean distances should be used to visualize the structure in the data.
However, if the rank order is imposed on the proximity matrix, then a nonmetric analysis should be used. The
choice of a stress measure will also affect how MDS results are displayed. Finally, the number of dimensions
selected for the MDS solution will determine how well the analysis represents the data structure. The
decisions are based on type of proximities (similar vs. dissimilar, direct vs. indirect, symmetric vs. asymmetric),
MDS model (metric vs. nonmetric, Euclidean vs. non-Euclidean, type of stress), number of dimensions (N
dimensions, number of objects), MDS analysis (individual, aggregate, weighted), and software used.

The choices for proximity matrix, MDS model, MDS analysis, sample size, variable scaling, and number of
dimensions are further clarified below.

306
Proximity Matrix

307
MDS Model

308
MDS Analysis

309
Sample Size
The data structure contains proximities—that is, distances or measures that indicate the objects of interest and
their closeness to each other. If the data are based on different sample sizes, a common scale could be
computed. For example, the number divided by the sample size would yield percentages. The percentages
would be put into the proximity matrix. Most researchers would consider percentages a rank ordering, hence a
nonmetric approach. If the rank of an object is used, no sample size differences are important.

310
Variable Scaling
MDS is based on proximities to establish the similarity or dissimilarity of the objects. If the data matrix
contains the miles between cities, then a common metric scale is used. However, if we have variables with
different scales of measurement, they would need to be placed on a common scale for metric analysis. For
example, if you had orange juice and apple juice, these could be measured in ounces. A correlation matrix by
definition is standardized—that is, variables have mean = 0 and standard deviation = 1. In this case, the
variable metric is the same.

When conducting market and survey research where respondents provide the self-reported ratings, a
nonmetric MDS should be conducted. The nonmetric approach only assumes that the monotonic order of
proximities are meaningful—that is, in a Likert-type scale from strongly agree, agree, neutral, disagree, to
strongly disagree.

311
Number of Dimensions
MDS can involve the selection of one or more dimensions. If you hypothesize that a set of items represents a
unidimensional scale, then a one-dimensional MDS solution would be sought. The number of dimensions is
affected by the number of objects (Borg & Groenen, 2005). For example, an N-dimensional solution requires
4N objects—that is, a two-dimensional solution requires a minimum of 8 objects. The STRESS goodness-of-
fit measure is also affected by the number of dimensions. Computing P2 and Mardia criteria are useful
indicators of how much variance the N-dimension solution provides. It is worthwhile to display the scree plot
and the Shepard diagram to further explore the number of dimensions.

312
R Packages
There are a few R packages available to run classical (metric) and nonmetric MDS analyses. You can obtain a
comprehensive list of packages using the following R command:

However, to specifically search for MDS packages, I suggest you use the following:

To find out more about a package, use this command:

The HSAUR, MASS, psych, psy, vegan, and stats packages are used at some point in this chapter. The HSAUR
package (Handbook of Statistical Analysis Using R) includes multiple data sets and has a specific Chapter 14
(Everitt & Hothorn, 2009), which conducts an MDS analysis of voting behavior using both metric and
nonmetric MDS analyses. It reports the P2, Mardia criteria, scree plot, and Shepard diagram with R code.
The MASS package computes the Kruskal nonmetric MDS analysis using the isoMDS() function. The psych
package uses the cor2dist() function to convert correlations to distances, which is necessary for MDS analyses.
The stats package computes the classical (metric) MDS analysis using the cmdscale() function. The vegan
package has the metaMDS() function that affords rotation to a proper solution, a wcmdscale() function to
perform weighted MDS analyses, and the eigenvals() function that extracts eigenvalues from many different
sources (factor analysis, principal components, and correlation or covariance matrices).

There are other packages that are not used in this chapter, but they are available in R and/or RStudio. The
other available packages are as follows:

313
Goodness-of-Fit Index
The problem in nonmetric MDS is to find a mapping that minimizes the squared differences between optimal
scaled proximities and distances between ordinal ranked objects. The coordinate points for mapping ordinal
proximities are determined by minimizing STRESS, which is computed as follows:
STRESS=Σ(f(p)−d)2Σd2,

where p is the proximity matrix, f(p) is a monotonic transformation of the p matrix to obtain optimally scaled
proximities, and d is the point distances. The STRESS value is subjectively interpreted (Kruskal, 1964a): .00
(perfect), .025 to .05 (good), .10 (fair), and .20 (poor). The value of STRESS will decrease as the number of
dimensions in the MDS solution increases.

A P2 and Mardia criteria (Mardia, 1979) have also been proposed. The P2 value is computed as the sum of the
eigenvalues for the number of dimensions divided by the total sum of the eigenvalues. Recall that the
eigenvalue is a measure of the generalized variance, so P2 indicates the amount of generalized variance
explained by the dimensions out of the total variance in the matrix. The P2 values range from 0 to 1 with
values closer to 1.0 indicating a better fit. Given the two dimensions, the P2 value is computed as follows:
P2=Σ(λ1+λ2)ΣλN.

The Mardia criterion is a variation on P2, where the numerator and denominator values are squared. Mardia
values also range from 0 to 1 with values closer to 1.0 indicating a better model fit. The Mardia value for a
two-dimension solution would be computed as follows:
Mardia=Σ(λ12+λ22)ΣλN2.

Note: The eigenvectors may contain negative eigenvalues, so the absolute values are used prior to summing.
Eigenvalues should be positive, but in MDS, extraction of all eigenvalues generally yields some negative
values.

Two other methods have been proposed for determining the number of dimensions: scree plot and Shepard
diagram. In the scree plot, the amount of STRESS is plotted against the number of dimensions. Since
STRESS decreases with the increase in the number of dimensions, we seek the lowest number of dimensions
with an acceptable STRESS level. The interpretation is similar to the scree plot in factor analysis, where the
elbow of the curve denotes the cutoff for determining the number of dimensions. The scree plot can be
plotted using the psy package and the scree.plot() function. The Shepard diagram displays the relation

314
between the optimally scaled proximities and the point distances. The Shepard diagram indicates a good fit
when there is less spread from the line of fit. This is a similar interpretation to the line of best fit in regression;
however, the points indicate a monotonically increasing trend of the optimally scaled proximities.

Note: There are other popular goodness-of-fit indices (GFIs) used in ecological data analysis (manhattan,
gower, bray, jaccard, horn, etc.). They are reported in the vegdist() function of the vegan package.

315
MDS Metric Example
The classical (metric) MDS analysis using distances between cities is the first example. The proximity matrix
d is the key to the classical MDS analysis. I will use the distances between cities to illustrate the metric MDS
analysis. The d proximity matrix is constructed as follows:

The three cities from the east coast and three from the west coast were chosen to maximize the separation and
provide a very clear MDS result. The proximity matrix should be symmetrical with upper and lower triangles
having the same values. The diagonal of the proximity matrix should contain zeros (0). A check of the matrix
is important prior to running the MDS analysis.

The classical MDS analysis computes the Euclidean distances using the dist() function. The R commands
would be as follows:

The Euclidean distances are directly computed using the formula discussed before; thus, no iterative process is
used to derive the values.

The classical MDS is now computed using the cmdscale() function as follows:

316
The $points are the two eigenvectors, and the $eig reports six eigenvalues. The $GOF reports the P2 and
Mardia criteria values. The P2 is computed as follows:

The second $GOF value reports the Mardia criteria. The computed value is slightly different from the one
reported in the cmdscale() function. The Mardia criteria are computed as follows:

The P2 and Mardia criteria are both very close to 1.0, which indicates a good fit.

The spatial map that plots the coordinate points of the eigenvector values is displayed using the following set
of R commands:

Note: The visual display to appear correctly requires a sign change in the eigenvectors. This is done by using
the minus sign (−) in the x and y eigenvectors. The east coast cities and the west coast cities are now mapped
correctly.

317
A map of the United States can be drawn using the following R command, where the approximate city
locations are superimposed:

The classical (metric) MDS uses a proximity matrix that inputs distance values between pairs of objects. The
distances between cities is a good representation of how the classical MDS approach calculates Euclidean
distances, and then maps the eigenvectors for a spatial representation.

318
The number of dimensions can be explored using the scree plot and the Shepard diagram with an
understanding of the number of eigenvalues. If we first extract the eigenvalues from the proximity matrix d,
we can determine numerically the number of positive eigenvalues greater than 1. The results of the eigen()
function indicate that only a single eigenvalue is positive (eigenvalues can be negative, but they are considered
noninterpretable and discarded in multivariate analyses).

We visually explore our understanding of eigenvalues by plotting a scree plot using the psy package and the
scree.plot() function.

The scree plot indicates that only one dimension is present, thus a mono-MDS solution. This also fits our
expectations given that only a single positive eigenvalue is extracted from the proximity matrix.

The Shepard diagram is generally used in a nonmetric MDS analysis, but it is displayed here to show another
way to visually connect the dimensions on a spatial map. The Shepard diagram is typically used to visualize
the monotonic differences in the eigenvector values. The Shepard diagram is available in the MASS package
using the Shepard() function. We first install and load the MASS package.

319
We need to create a file with the $x and $y values for plotting given the Euclidean distances (euclid) and the
eigenvectors (fit2$points). The Shepard() function uses data in these two files and stores the results in the
dist_sh file.

We use the values in the dist_sh file to plot the Shepard diagram. The axes are scaled from 0 to 7,000 given
the range of values for $x and $y. Also, these values are extracted from the file in the lines() function.

The Shepard diagram may look unusual at first, but it does show a maximum separation of the east and west
coast cities based on their monotonic increasing distances. It also indicates that the coordinate values for each
city do not fall far from the line, which indicates a good model fit.

320
321
MDS Nonmetric Example
The nonmetric MDS approach uses ordinal data, which are often self-reported from marketing research or
survey questionnaires. The nonmetric MDS is performed using the isoMDS( ) function in the MASS
package. The data set is from the psych package called iqitems, which contains 16 items on the Synthetic
Aperture Personality Assessment for 1,525 subjects (Condon & Revelle, 2014). There are 4 reasoning
questions, 4 letter sequencing questions, 4 matrix reasoning tasks, and 4 spatial rotation questions. The 16
items were taken from a larger set of 80 items to reflect four factors or dimensions (verbal reasoning, letter
series, matrix reasoning, and spatial rotations). These four dimensions are expected to emerge from the data—
that is, the d proximity matrix and the euclid matrix of Euclidian distances, but this time, they assume
monotonic ordering of the distance values.

We first install and load the required R packages.

The data set is acquired, and the descriptive statistics on the 16 items are computed. There are two missing
values on a few of the items. We will use the na.omit() function to remove the missing cases, leaving 1,523
subjects for analysis.

322
We proceed by creating a correlation matrix of the 16 items, and then converting them to distances using the
cor2dist() function in the psych package. The file distNMDS is the proximity matrix used in the MDS
analysis.

We are now ready to run the nonmetric MDS using the isoMDS() function. The $points are the two
eigenvectors, and the $stress is the STRESS value. The subjective interpretation of the STRESS value is the
presence of a poor model fit, since it is not close to 0.

We can rerun the nonmetric MDS again, but this time specifying the 4 dimensions, which reduced the $stress
value. However, recall that adding dimensions will reduce the stress value.

323
Note: The isoMDS() function is iterative; therefore, you will get slightly different results each time you run
the function.

A plot of the eigenvectors displays a spatial mapping of the items; however, this time a minus sign (−1) is not
necessary to output the item locations correctly.

324
The reasoning questions were not grouped together—that is, items r_4, r_16, and r_19 were grouped with
letter and matrix items. The rotation items appear on a diagonal from low to high (ro_6, ro_ 8, ro_3, and
ro_4). The matrix items show up across Dimension 1 at about the same level as Dimension 2 (m_45, m_46,
m_47, m_55). The letter items are closely grouped together. Ideally, we would see four areas on the map with
each having four related items grouped together.

The scree plot will help identify the number of dimensions based on the eigenvalues of the proximity matrix.
The psy package contains the scree.plot() function, which displays the eigenvalue cutoff point. The scree plot
indicates that at least four dimensions are present—that is, eigenvalues greater than 1.0 are shown above the
dotted line.

325
The Shepard diagram also permits a visual inspection of the ordination (monotonic ordering of the items).
The Shepard() function in the MASS package uses a lower triangle of the distance matrix. Therefore, the
dist() function was used after converting the correlation matrix to a distance matrix (see above). Also, the
results from a four-dimension solution were used in the function.

After running the function, we can plot the Shepard diagram using the $x and $y values. The plot command
is given as follows:

326
The Shepard diagram shows a good monotonic ordering of the items, except for a few items at the bottom of
the trend line. An ideal plotting of the items would be a straight line with the items positioned close to the
line of fit.

The authors intended the 16 items to be used in factor analysis to represent a higher g factor. Another data set
with the items that scored correct (1) or incorrect (0) was also provided for conducting a factor analysis using
tetrachoric correlations. Both classical and ordinal factor analysis are possible. MDS also permits a classical
(metric) and ordinal (nonmetric) exploratory analysis.

327
Reporting and Interpreting Results
MDS is an exploratory method with many variations or options to explore. The MDS analysis can proceed
using a metric (classical) or nonmetric (ordinal) function. MDS can involve a direct method (enter exact
distances or ordinal values) or an indirect method (self-report ratings on questions) of acquiring data. MDS
can proceed with individual, aggregate, or weighted solutions. Given the different terms and approaches, it is
advised that a researcher clearly indicate the type of MDS that was conducted and define the terms used for
the reader. A sample write-up of the nonmetric MDS can be used to guide your reporting.

A nonmetric MDS analysis was conducted using 16 items from the Synthetic Aperture Personality Assessment
given to 1,525 subjects (Condon & Revelle, 2014). Missing data for two subjects reduced the sample size to
1,523. The nonmetric MDS was chosen given the use of correlations between items and the creation of an
ordinal proximity matrix. The R MASS package with the isoMDS() function was used to conduct the MDS
analysis. The STRESS value indicated a poor model fit. A scree plot indicated that at least four dimensions
were present in the proximity matrix, which would be related to the 4 reasoning questions, 4 letter sequencing
questions, 4 matrix reasoning tasks, and 4 spatial rotation questions. A Shepard diagram indicated a good
monotonic ordering for most of the items with a few exceptions. The spatial map in Figure 11.1 shows that
four distinct groups are not present for the sets of items.

Figure 11.1 Spatial Map of 16 Items

328
Summary
MDS helps visualize the level of similarity or dissimilarity among objects. MDS then obtains underlying
dimensions from the subject’s judgments about the similarity or dissimilarity of objects in paired comparisons.
The objects of interest could be people, voting, events, colors, stocks, distance traveled between cities, or
paired comparisons in a survey questionnaire. The MDS algorithm places the subject in an N-dimensional
space so that distances between objects are clarified into similar groupings, denoted as δij, which is the
distance between the ith and jth object. There are several MDS types referred to as classical(metric), nonmetric,
and generalized. MDS also has a GFI to indicate how well the groups are formed.

MDS has two basic approaches—the direct method, which inputs exact distances, and the indirect method,
which assigns distance based on paired comparisons. In addition, MDS has two different algorithms. The
metric (classical) uses a common scale of distances that are assumed to be the interval or ratio level of
measurement. The nonmetric method uses ordinal ratings by subjects (paired comparisons), commonly found
on survey questionnaires. The measure of MDS fit follows from a STRESS value, as well as from the
examination of a scree plot, a Shepard diagram, a P2 value, or the Mardia criterion.

The MDS analysis can involve individuals, aggregate summary data, or weighted data. The individual data are
when subjects respond to paired comparisons of objects. The aggregate data use averages in the proximity
matrix. The weighted analysis represents object and individual differences on a spatial map—that is, it uses an
individual proximity matrix of individuals to indicate different groups of subjects. This is referred to as
individual difference scaling.

MDS results in the selection of one or more dimensions. If you hypothesize that a set of items represents a
unidimensional scale, then a one-dimensional MDS solution would be sought. The number of dimensions is
affected by the number of objects. For example, an N-dimensional solution requires 4N objects—that is, a
two-dimensional solution requires a minimum of 8 objects. The STRESS goodness-of-fit measure is also
affected by the number of dimensions. Computing P2 and Mardia criteria are useful indicators of how much
variance the N-dimension solution provides. A researcher should also display the scree plot and the Shepard
diagram to further explore the number of dimensions, and aid interpretation.

329
Exercises
1. Explain the difference between metric and nonmetric MDS.
2. Explain the difference between the direct and indirect method in MDS.
3. How would you interpret the STRESS goodness-of-fit value?
4. How would you interpret the P2 and Mardia criterion?
5. Explain why the number of dimensions are important in MDS.
6. Conduct a classical (metric) MDS analysis, cmdscale() function, using the burt data set in the psych package, and report results.
The data set contains 11 emotional variables in a correlation matrix (Burt, 1915) that was slightly changed (see R documentation >
help(burt). Use psy package and scree.plot() function with burt data set to graph scree plot. Compute proximity matrix using dist()
function, and use proximity matrix in cmdscale() function. You will need to install and load the following R packages.

Note: stats is the default library in R and should not have to be installed or loaded.

330
Web Resources
Handbook of spatial analysis methods with programs and data sets

http://www.cs.umd.edu/~hjs/quadtree/

Handbook of statistical analysis using R (multiple data sets described)

http://cran.r-project.org/web/packages/HSAUR/HSAUR.pdf

Introduction to MDS for the nonstatistician

http://homepages.uni-tuebingen.de/florian.wickelmaier/pubs/Wickelmaier2003SQRU.pdf

R code for classical and nonmetric MDS

http://www.statmethods.net/advstats/mds.html

331
References
Borg, I., & Groenen, P. (2005). Modern multidimensional scaling: Theory and applications (2nd ed., pp.
207–212). New York, NY: Springer-Verlag.

Burt, C. (1915). General and specific factors underlying the primary emotions (Report No. 85). Presented at
the meeting of the British Association for the Advancement of Science, Manchester, England.

Carrol, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an N-
way generalization of Echart-Young decomposition. Psychometrika, 35, 283–319.

Condon, D., & Revelle, W. (2014). The international cognitive ability resource: Development and initial
validation of a public domain measure. Intelligence, 43, 52–64.

Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling. London, England: Chapman & Hall.

Everitt, B. S., & Hothorn, T. (2009). A handbook of statistical analyses using R (2nd ed.). Boca Raton, FL:
CRC Press.

Green, P. (1975). Marketing applications of MDS: Assessment and outlook. Journal of Marketing, 39(1),
24–31.

Jaworska, N., & Chupetlovska-Anastasova, A. (2009). A review of multidimensional scaling (MDS) and its
utility in various psychological domains. Tutorials in Quantitative Methods for Psychology, 5(1), 1–10.

Kruskal, J. B. (1964a). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.


Psychometrika, 29, 1–27.

Kruskal, J. B. (1964b). Nonmetric dimensional scaling: A numerical method. Psychometrika, 29, 115–129.

Kruskal, J. B., & Wish, M. (1978). Multidimensional scaling (Sage university paper series on quantitative
application in the social sciences, 07-011). Beverly Hills, CA: Sage.

Mardia, K. V. (1978). Some properties of classical multidimensional scaling. Communications on Statistics—


Theory and Methods, A7, 1233–1241.

332
Mardia, K. V., Kent, J. T., & Bibby, J. M. (1979). Multivariate analysis. London, England: Academic Press.

Meyers, L. S., Gamst, G., & Guarino, A. J. (2013). Applied multivariate research: Design and interpretation
(2nd ed.). Thousand Oaks, CA: Sage.

Shepard, R. N. (1962). The analysis of proximities: Multidimensional scaling with an unknown distance
function. Psychometrika, 27, 125–139, 219–246.

Shepard, R. N. (1963). Analysis of proximities as a technique for the study of information processing in man.
Human Factors, 5, 33–48.

Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science,
237(4820), 1317–1323.

Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17, 401–419.

Torgerson, W. S. (1958). Theory and methods of scaling. New York, NY: Wiley.

Wickelmaier, F. (2003). An introduction to MDS: Sound quality research unit. Aalborg, Denmark: Aalborg
University.

333
12 Structural Equation Modeling

Overview
Assumptions
Multivariate Normality
Positive Definite Matrix
Equal Variance–Covariance Matrices
Correlation Versus Covariance Matrix
Basic Correlation and Covariance Functions
Matrix Input Functions
Reference Scaling in SEM Models
R Packages
Finding R Packages and Functions
SEM Packages
sem
lavaan
CFA Models
Basic Model
Model Specification
Model Estimation and Testing
Model Modification
Model Diagram
Multiple Group Model
Chi-Square Difference Tests
Graphing Models
Structural Equation Models
Basic SEM Model
Longitudinal SEM Models
Basic Latent Growth Model
Advanced Latent Growth Model
Reporting and Interpreting Results
Summary
Exercises
Web Resources
References

334
http://www.york.ac.uk/depts/maths/histstat/people/wold.gif

Herman Ole Andreas Wold (December 25, 1908, to February 16, 1992) pioneered the Wold decomposition in time-series analysis,
where discrete time was decomposed into two unrelated processes, one deterministic and the other a moving average. In 1927,
Herman enrolled at the University of Stockholm, Sweden, to study mathematics. He was a student of Harald Cramér, and together
they created the Cramér–Wold theorem in 1936, which provided a proof for extending the univariate central limit theorem to the
multivariate central limit theorem. In 1943, the idea of solving a system of simultaneous equations using the maximum likelihood
approach in econometric research was presented by Trygve Haavelmo (1989 Nobel Prize in Economics), however, it was questioned
by Herman Wold. From 1945 to 1965, Herman worked on his recursive causal modeling approach using least squares estimation,
believing it was an efficient method of estimation with superior properties over maximum likelihood estimation. Herman became
well-known for his exploratory methods using partial least squares, which is in contrast to the covariance-based SEM (structural
equation modeling) approach developed by his student Karl G. Jöreskog (Hair, Hult, Ringle, & Sarstedt, 2014).

335
Journal of Educational and Behavioral Statistics June 2011 36: 403-412, first published on February 16,
2011doi:10.3102/1076998610388778

Karl Gustav Jöreskog (April 25, 1935, to present) was born in Amål, Sweden. Karl received his bachelor’s, master’s, and doctoral
degrees at Uppsala University in Sweden. He is currently an emeritus professor at Uppsala University. He was a doctoral student of
Herman Wold, and branched out with a hypothesis testing approach to structural equation modeling, LISREL (linear structural
relations). LISREL used the maximum likelihood estimation method and hypothesis testing approach for latent variable models,
under the assumption of multivariate normality. This was in contrast to the partial least squares approach pioneered by Herman
Wold. He worked at the Educational Testing Service and spent a short time as a visiting professor at Princeton University. During
this time, he improved on his analysis of covariance structures using maximum likelihood estimation in factor analysis (Jöreskog,
1969; 1970). In 2001, a Festschrift was published in his honor to highlight his contributions in structural equation modeling
(Cudeck, Jöreskog, Du Toit, & Sörbom, 2001). In 2007, Karl received an award for his distinguished scientific contributions in
psychology from the American Psychological Association.

336
Overview
Structural equation modeling (SEM) has been called by different names over the years: covariance structure
analysis and latent variable modeling. Today, there appears to be an acceptance of the term structural equation
modeling, which was derived from the field of econometrics. The SEM approach involves two separate
models: a confirmatory factor analysis (CFA) model and a structural equations model. The CFA hypothesizes
and tests a measurement model where the observed variables are indicated on one or more latent variables.
The CFA model provides the estimates to compute the latent variable scores. The structural equation model
hypothesizes the relation among the latent variables and solves the set of simultaneous linear equations.

Since the 1970s, there have been many improvements to the LISREL software program developed by Karl
Jöreskog and Dag Sörbom (Jöreskog & Sörbom, 1979; LISREL, 2014). LISREL initially required a good
working knowledge of matrix operations in specifying the measurement and structural models. Today,
LISREL has become user friendly with many different (SEM) applications (Schumacker & Lomax, 2010).
Many other SEM software packages have also been developed: AMOS, EQS, Mx, OpenMx, Mplus, Sepath to
name only a few. Each of the software packages offered a unique approach or user-friendly interface to expand
the types of data analysis possible. Today, SEM is no longer limited to linear relations among latent variables
(Schumacker & Marcoulides, 1998). The numerous textbooks on SEM have also helped expand the various
types of applications used in many disciplines today (medicine, education, psychology, business, nursing,
military, engineering, etc.).

A logical extension away from commercial packages was the development of free software to conduct SEM,
for example, Mx, a matrix operator software package, or OpenMx, which adds path model diagramming
capabilities. More recently, R software is available to conduct SEM. This software is differentiated from the
factor analysis software I presented earlier because it combines the test of a measurement model (CFA) with
the additional testing of the relations among the latent variables in a structural model. This chapter will
therefore cover the basic assumptions, an understanding of the difference between a correlation and covariance
matrix, a brief discussion of the different R packages, and finally CFA and SEM examples using R functions.
I am only covering basic types of models and refer you to the list of books under the Web Resources section
for further reading.

337
Assumptions
SEM is a correlation based approach for studying multiple variable relations, which is in contrast to
multivariate methods that test mean differences (MANOVA). Correlation methods use partial correlations to
control for extraneous variables that influence the bivariate correlation, thus obtaining the unique relation
between a pair of variables. Research designs using random sampling and random assignment are used to
control extraneous variables when testing mean differences. Historically, this has been the Sir Ronald Fisher
analysis of variance approach versus the Karl Pearson correlation approach. The mean difference and the
correlation method were brought together in the analysis of covariance (ANCOVA, MANCOVA), where
group means were adjusted based on a common regression weight (slope) from a set of covariate variables. I
also showed an alternative approach, propensity score method to match subjects, thus control for extraneous
variables, which does not require the statistical control of prior group differences. Today, this blending of
statistical methods is being referred to as the general linear model.

Multivariate methods, whether mean difference or correlation based, require the assumptions addressed in
previous chapters. SEM, which encompasses all of the other statistical methods, is therefore more affected by
the violations of assumptions discussed in the book (see Chapter 2). SEM uses observed and latent variables,
which requires additional assumptions. Consequently, multivariate methods, including SEM, that use means
and correlation are sensitive to data issues involving missing data, outliers, nonnormality, restriction of range,
and nonlinearity (Schumacker, 2014).

338
Multivariate Normality
We can test the assumption of multivariate normality in SEM using R commands from the MVN package by
Selcuk Korkmaz and Dincer Goksuluk (> help(package=MVN). We will test the multivariate normality of
variables in the Iris data set. The Iris data set contains 150 samples from three species of Iris (setosa, versicolor,
virginica) with four measures of length and width for sepals and petals. The data set is in the base R package,
so simply issue the following command to access the data.

Next, we install and load the MVN package.

We will be selecting only the setosa variety of Iris and the four variables to avoid the factor variable, Species.
The covariance matrix must be a square matrix with numeric variables only.

The Mardia test of multivariate normality (Mardia, 1970) is based on skewness and kurtosis measures, and
computed with the following R command:

339
The chi-square values for skewness (g1p = 3.079, p = .177) and kurtosis (g2p = 26.537, p = .195) are both
nonsignificant, thus the Mardia test indicated that the data were multivariate normal. The Shapiro–Wilk test
of normality is in the mvnormtest package. The package will need to be installed and loaded as follows:

The Shapiro–Wilk test requires that the covariance matrix be transposed. This is done using the built-in t()
function. The mshapiro.test() function then includes this transposed file.

The W = .95 with p = .07 indicated a nonsignificant value, therefore normality is assumed for the setosa variety
of Iris measurements.

The Shapiro–Wilk test of multivariate normality for the versicolor Iris variety and its four measurements are
also computed as follows:

The four measures on the versicolor Iris variety did not indicate multivariate normality (W = .93, p = .0005).
We will use these two data sets and separate covariance matrices later in the chapter.

340
Positive Definite Matrix
A very disturbing message often appears when a researcher does not properly screen the raw data. The
message usually indicates Nonpositive Definite Matrix, so the program stops! This is befuddling to researchers
without a background in matrix algebra. So what does it mean? Well, the matrix is not full rank, or it does not
meet the order condition. Okay, still confused? In practical terms, the matrix values do not compute! This can
be due to matrix values exceeding their expected values, for example, r > 1, missing values, or linear
dependency—all leading to not being able to obtain parameter estimates, that is, obtain a solution to the set of
equations.

There are a few things we can quickly check to determine if our covariance (correlation) matrix is desirable for
analysis. These steps include checking to see if the matrix is an identity matrix, determinant of the matrix is
not zero or negative, eigenvalues of the matrix are positive, and multivariate normality. These are
accomplished in R using the following commands.

1. Check for identity matrix

The statistically significant Bartlett test (χ2 = 611.988, df = 6, p < .0001), indicates that the covariance matrix
is not an identity matrix.

An identity matrix has off-diagonal values of zero, and 1s on the diagonal, which was not the case in the Iris
covariance matrix. The identity matrix (I) is computed using the determinant of the covariance matrix, C. If
the determinant is zero, then matrix inversion used in division is not possible. Typically covariance
(correlation) matrices with multicollinearity and/or linear dependency result in zero determinants. The
identity matrix (I) is given by multiplying the covariance matrix, C, by the inverse of the matrix, C−1, denoted
as: CC−1 = I.

2. Check determinant of matrix

The determinant of the covariance matrix is positive, that is not zero or negative. The determinant of the

341
matrix must be positive to permit matrix inversion. Division in matrix algebra is done by multiplication using
inverted matrices. The determinant is a measure of generalized variance that takes into account the covariance
in a matrix, thus variance–covariance. The trace of the matrix is the sum of the diagonal values—that is, the
total variance, but without consideration of the affect of covariance.

3. Check eigenvalues of matrix

The eigenvalues of the covariance matrix should be positive because they indicate the amount of variance
extracted by each set of eigenvectors. When solving a set of simultaneous equations, more than one equation
or solution is possible, given the algebraic quadratic equation. Each equation has a set of eigenvector values
that yields an eigenvalue. The single eigenvalue indicates the amount of variance for that particular solution.
Recall, the sum of the eigenvalues indicates the total amount of variance across all possible solutions; that is,
∑λi = (.236 + .037 + .027 + .009) = .309 (31%).

4. Check multivariate normality of matrix

The Shapiro–Wilk test is nonsignificant (W = .85, p = .23), which indicates that the data are multivariate
normally distributed. Sometimes, data are not multivariate normally distributed when a single variable is not
univariate normally distributed. A data transformation on the non-normal variable values will result in
meeting the multivariate normality assumption. The Cramér–Wold theorem provides support for the
extension of univariate to multivariate normality, given all variables are normally distributed. In most cases,
violation of this assumption leads to erroneous results. As noted earlier, data transformations are generally
used to adjust variable distributions. SEM software provides other alternative estimation methods, for
example, weighted least squares method. In other cases, multivariate methods are robust to minor departures
from nonnormality.

Overall, if you have multivariate normality issues with your covariance (correlation) matrix that involve the
presence of an identity matrix, nonadmissible determinant value, or nonadmissible eigenvalues, then
parameter estimates will not be computed or will not be accurate estimates of population parameters.

342
343
Equal Variance–Covariance Matrices
In some SEM models, it is important to determine the groups that have equal variance–covariance matrices,
for example, multiple group models. The Box M test is a widely used test for the homogeneity of variance–
covariance matrices, which is an extension of the Bartlett univariate homogeneity of variance test (see Chapter
2). The Box M test uses the determinants of the within-covariance matrices of each group that is the
generalized variances of each group. The Box M test is sensitive to departures from multivariate
nonnormality, so that should be checked first before checking the equality of group variance–covariance
matrices. Simply stated, the Box M test may be rejected due to a lack of multivariate normality, rather than
the covariance matrices being unequal. The Shapiro–Wilk test indicated that the multivariate normality
assumption did hold for the setosa Iris variety, but not for the versicolor Iris variety.

The biotools package has a boxM() function for testing the equality of variance–covariance matrices between
groups. The package can be installed from the main menu in R. To test group equality of the variance–
covariance matrix, a factor variable (categorical group membership) must be declared separately.

The null hypothesis of equal variance–covariance matrices was rejected. The two different Iris varieties had
different variance–covariance matrices. An SEM analysis comparing groups would therefore be suspect,
especially since nonnormality existed among variables in the versicolor group.

Note: There are several univariate functions in R packages that test the equality of variance assumption
[var.test(); bartlett.test (); library(car) - leveneTest()], and others in Chapter 2 that show a comparison of five
different approaches.

344
Correlation Versus Covariance Matrix
SEM is based on using a covariance matrix of observed variables (hence the early use of covariance structure
analysis). Variable relations can be entered as a correlation matrix but will be treated as a covariance matrix
with the variables standardized. The correlation matrix by definition contains variables with a mean = 0 and
standard deviation = 1. The standardizing of variables makes comparisons between variables easier but
removes the scale difference among the variables. Recall in multiple regression where the standardized
solution yields an intercept = 0 compared with an unstandardized solution where the line of best fit falls at an
intercept point on the y axis. The intercept provides a baseline measure or starting point for the slope change
in y given a value of x. A covariance matrix, when not input as a correlation matrix, maintains the original
variable means and standard deviations. This is a very important distinction, which affects the interpretation
of results. SEM permits inputting raw data, correlation, or covariance matrices when analyzing different
model applications.

SEM uses a covariance (correlation) matrix in both the measurement and structural models. When the
correlation matrix is input, the standard errors can be biased (Cudeck, 1989; Steiger, 1980a, 1980b). SEM
software today provides a weighted estimation method to permit the correct standard errors, and standardized
solutions are possible (Raykov & Marcoulides, 2000). The standard error is used in the denominator of
statistical tests, for example, a t test, so if incorrectly estimated as either too low or too high, the statistical test
is biased. My best advice is to use the covariance matrix whenever possible and simply request the
standardized solution in the software, which was summarized on SEMNET, a discussion website
(http://www2.gsu.edu/~mkteer/covcorr.html).

There are several important functions that help our use of correlation and covariance matrices. SEM does not
require the use of the entire data set in most types of modeling, therefore, the cor() and cov() functions easily
provide the square full matrix required in the analysis.

345
Basic Correlation and Covariance Functions
The cor() function is used to create a correlation matrix from raw data:

The cov() function is used to create a covariance matrix from raw data:

You can also convert a covariance matrix to a correlation matrix using the cov2cor() function:

The cor2cov() function in the lavaan package can be used to convert a correlation matrix to a covariance
matrix, but it requires the standard deviations, so I first computed these values from the data set, Iris.

We must install and load the lavaan package to use the cor2cov() function. You must insert the variable
standard deviations in the function. The resulting covariance matrix is very similar to the original covariance
matrix (only difference is due to rounding error from standard deviation values).

346
347
Matrix Input Functions
Journal articles and books typically report correlation and covariance matrices using only the lower or upper
triangle of the full square matrix. Statistical packages require a square matrix or full matrix for mathematical
operations. You can enter lower triangle values of a correlation or covariance matrix and have it converted to a
square (full) covariance (correlation) matrix by using the lower2full() function in the lavaan package. We
insert the lower triangle correlation values from the Iris correlation matrix, then create the square matrix as
follows:

Note: This is a nice feature, especially when inputting a correlation or covariance matrix from a journal article
and conducting SEM analysis to replicate the results.

The cortest.mat() function in the psych package can be used to compare two correlation or covariance
matrices. For example, given the two hypothetical correlation matrices, are they the same or different?

The chi-square = 2.2, df = 6, and p < .9, which indicates that the two correlation matrices are similar. Note:
You must specify a sample size for each correlation matrix in the function.

To find out other features and arguments for the function use the following command:

348
Herman Wold developed partial least squares modeling. Partial correlation indicates the unique bivariate
correlation controlling for other variables in the matrix, whereas the Pearson correlation indicates only the
linear bivariate relation of variables. The partial correlation is computed using the Pearson correlations
(Schumacker & Lomax, 2010). It is the controlling for other variables in correlation methods that makes the
partial correlation important—for example, in the estimation of regression weights, path coefficients, factor
loadings, or structure coefficients. In regression, when the partial correlation is higher than the Pearson
correlation, it indicates a suppressor variable. So, in many respects, our interest is in the partial correlation, not
in the Pearson correlation, when testing hypothesized theoretical models. The cor2pcor() function in the
corpcor package computes a partial correlation matrix directly from the Pearson correlation matrix or
covariance matrix, thus saving time in hand calculations.

We will use the correlation and covariance matrices from the Iriscor and Iriscov data files (setosa variety). They
are simply entered into the function as follows:

Note: We would inspect these partial correlations to determine if any were higher than their corresponding
Pearson correlations. If so, we have a problem, which will surface in our data analysis. Fortunately, all of the
partial correlations are less than their corresponding Pearson correlations.

Alternatively, the pcor2cor() function takes a partial correlation or partial covariance matrix and computes the
corresponding correlation or covariance matrix.

349
The matrix input functions may seem unusual, however, they are very common in correlation based research
methods (regression, path, CFA, and SEM). The sample size, means, standard deviations, and correlation
matrix are generally all the information needed to be input for conducting CFA and structural equation
models. The covariance matrix, means, and sample size are another set of information that can be input for
data analysis. This summary information is helpful, especially when using large national databases, which are
becoming increasingly available for data analytics.

350
Reference Scaling in SEM Models
SEM software outputs both unstandardized and standardized solutions, similar to multiple regression and
path analysis (Schumacker & Lomax, 2010). Researchers are often confused about this, especially when
commercial software provides a reference scaling option, for example, 1.0 added to path of a variable. The
reference scaling does not affect the analysis solution. This was best explained by Karl Jöreskog and Dag
Sörbom (Jöreskog & Sörbom, 1993) when stating:

Latent variables are unobservable and have no definite scales. Both the origin and the unit of
measurement in each latent variable are arbitrary. To define the model properly, the origin and the
unit of measurement of each latent variable must be defined. (p. 173)

The latent variable typically defaults to a standardized value; that is, a mean = 0 and a standard deviation
(variance) = 1. This can be useful and meaningful when comparing latent variables. However, a researcher can
choose an observed variable to represent the scale for the latent variable. This is commonly referred to as a
reference variable. Generally, the reference variable that has the highest factor loading in the measurement
model is selected, which implies that it represents the latent construct best.

A researcher should therefore consider whether standardized values or reference variable scaling is most
meaningful for interpretation. This is a similar understanding when conducting multiple regression, where
equations can be interpreted with or without an intercept. SEM computes latent variable scores using the
factor loadings in the measurement model via a regression equation. The latent variable scores are used in the
structural equation model. Therefore, care should be taken in understanding whether standardized (z scores)
or raw score scaling is preferable when using a reference variable in interpreting latent variable effects.

351
R Packages

352
Finding R Packages and Functions
There are many packages and functions in R that make its usefulness seem overwhelming to many users. A
few functions can help you navigate around and find what you need.

You can also use websites that have helpful resources:

http://www.rseek.org # Web site to search for functions

or

http://cran.r-project.org/web/views/ # CRAN list of Views

353
Note: The ctv package needs to be installed to view contents:

The Psychometrics view contains information about R packages and functions for structural equation models,
factor analysis, and PCA.

354
SEM Packages
There are two current SEM packages in R: sem and lavaan.

sem
The sem package is by John Fox (2006), and a list of functions in the package are given using the help()
function.

The sem package provides several different ways to specify the measurement models, cfa(), and the structural
models, sem(), using matrix notation, equations, or model statements. We will use the approach that uses
model statements with variable names. For example, using the Holzinger and Swineford covariance matrix on
six psychological variables for 301 subjects in Schumacker and Lomax (2010, p. 171), the R commands would
be as follows:

The CFA bifactor model is given the name cfamodel and is used in the sem() function along with the
covariance matrix and sample size. The summary() function provides the output, which includes the chi-
square test, factor loadings, and variable variances. The special use of double tilde, ~~, indicates that the latent
variables are to be correlated. The commands are entered one at a time in the RGui window.

Note: The psych package by William Revelle (2014) has several data sets, and it has interfaces with the sem
package to provide functions for CFA models and structural equation models, including simulation of data,
matrix creation, and graphing of models. More information is available at the following:

lavaan
The lavaan package by Yves Rossell (2012) uses a similar method for specifying the theoretical model. The
latent variable names are specified (lv1; lv2), and each set equal to (= ~) a list of observed variables. The
correlation of the latent variables is signified by using the double tilde (~~). The entire model specification is

355
set between single quotation marks. The CFA bifactor model would be specified as follows:

The CFA model would be run using the cfa() function, where the sample covariance matrix and sample size
are specified. The summary() function would print out the results.

The lavaan package has numerous functions for handling matrices, fitting CFA models and latent variable
models. You can obtain more information about lavaan at the following:

We will use the lavaan package to conduct a few basic CFA and SEM models, because today there are so
many different types of theoretical models; it would take a full textbook to cover all of them. This should not
hinder your specifying a covariance matrix and type of model in R and running it. However, it is important to
pay attention to the arguments available in the functions, because they change the estimation method, output
the fit measures, provide modification indices, and produce either standardized or unstandardized results. A
comment (#) will be provided when needed to help you in each modeling application.

356
CFA Models

357
Basic Model
We first load the lavaan package. Next, we need to input the covariance matrix on six psychological variables
for 301 subjects from the Holzinger and Swineford study provided in Schumacker and Lomax (2010, p. 171).
Notice, we will be using the lower2full() function in the lavaan package and specify that there are diagonal
values in the lower matrix. We will also use the names() function to provide variable names.

Model Specification
We now specify the CFA bifactor model to indicate two latent variables, spatial and verbal, with their
respective observed variable names from the covariance matrix. The two latent variables are correlated, spatial
~~ verbal, using the double tilde symbols.

Model Estimation and Testing


We run the CFA bifactor model using the cfa() function and output the results to a file, named cfa.fit.

The summary() function provides the output of the confirmatory factor analysis model.

The results indicated a poor data to model fit because the chi-square value is statistically significant (χ2 =

358
24.365, df = 8, p = .002). This indicates that the sample covariance matrix (S), fullcov, is not close to the
model-implied (reproduced) covariance matrix based on the CFA bifactor model. We seek a nonsignificant
chi-square value that would indicate that the sample covariance matrix and the model-implied covariance
matrix are similar.

Note 1: The argument, std.lv = FALSE uses the first observed variable for each latent variable as a reference
variable and sets the factor loadings to 1 (Visperc and Parcomp). When std.lv = TRUE, a standardized solution
is given for all observed variables.

Note 2: The argument, standardized = FALSE, provides the unstandardized coefficients and standard errors.
Changing this to standardized = TRUE would provide the standardized factor loadings.

Model Modification
We can explore model modification indices to improve the model fit. There is a function to guide this
selection, modindices(). Simply provide the cfa.fit file results. The boldfaced mi values (modification index)
indicate the ones of interest.

359
This model modification output can be helpful, but it is also a little overwhelming. The column labeled mi
indicates the value for the modification indices. We wish to find the mi value that is not only the highest but
also makes the most theoretical sense to include in our CFA bifactor model.

The first six lines indicate the spatial latent variable with all 6 observed variables. Obviously, we specified that
spatial had three observed variables (Visperc, Cubes, and Lozenges), so the high mi = 7.969 would suggest that
Sencomp is a possible indicator of both spatial and verbal latent variables. The next 7 to 12 lines indicate the
verbal latent variable with all 6 observed variables. Obviously, we specified that verbal had three observed
variables (Parcomp, Sencomp, and Wordmean), so the high mi values for Visperc = 10.433 and Lozenges = 9.202
would suggest that they are possible indicators of both spatial and verbal latent variables. We could include
paths for these variables to show shared variance between the two latent variables, but this would completely
change the hypothesized CFA bifactor model. We should instead examine the error variances of the observed
variables.

The next 13 to 33 lines indicate whether error covariance needs to be correlated between pairs of observed
variables. The correlation of error covariance can be required when unexplained variability is not included in

360
the model. The mi values for the following pairs of observed variables have high mi values: Visperc ~~ Cubes =
9.202; Cubes ~~ Lozenges = 10.433, Lozenges ~~ Sencomp = 7.349, and Parcomp ~~ Wordmean = 7.969. The best
approach is to select the error covariance with the highest mi and make that single modification to the CFA
bifactor model. Making this change would not drastically alter my original hypothesized bifactor measurement
structure. You would rerun the modified model, then check again for modification indices if the chi-square
value was still statistically significant.

The modified CFA bifactor model with the Cubes ~~ Lozenges correlated error covariance specified would be
as follows:

The chi-square is statistically significant (χ2 = 13.976, df = 7, p = .052), unless you are a staunch believer that p
> .05 by .002. In any case, with such ease and the use of high speed computers, why not check to see what else
might be required in the model. We can easily rerun the modification function again.

The Parcomp ~~ Wordmean = 5.883 error covariance had the highest mi value in the output (output not
shown). We would add this correlated error covariance term to the CFA bifactor model. The final CFA
bifactor model, fcfa2.model, was now specified as follows:

361
The CFA bifactor model chi-square was not statistically significant (χ2 = 7.451, df = 6, p = .281). The factor
loadings of the observed variables were statistically significant, and the spatial and verbal latent variables were
significantly correlated (r = .42; z = 6.157, p < .0001). The covariance terms I specified were both statistically
significant. These correlated error covariance terms were added not only based on their higher mi values but
also because it made sense to correlate pairs of observed variables on the same latent variable to address the
unmodeled error variance.

Model Diagram
The final model can be diagrammed to show the CFA bifactor structure using the lavaan.diagram() function
in the psych package. The R commands are given as follows with the outputted graph in Figure 12.1.

362
The graph shows the unstandardized factor loadings and the correlation between the two latent variables
—spatial and verbal. The two correlated error covariance terms that were specified in the final model are also
indicated by curved errors between the respective pairs of variables.

363
Multiple Group Model
The multiple group model is testing whether two or more groups differ in the measurement model. The
measurement model involves selecting three or more observed variables to indicate a latent variable. The CFA
multiple group model involves testing whether the factor loadings are the same between the groups. In
practical terms, it is a test of measurement invariance. Measurement invariance implies that the groups do not
differ on the construct. There are SEM model applications where the assumption of measurement invariance
is important, thus indicating that the construct is the same for the groups.

A researcher however might be interested in testing a hypothesis that the latent variable was different between
two groups. In this case, we would want the factor loadings to be different, thus producing different latent
variable scores. (Recall, the factor loadings are used in a regression equation to compute the latent variable
scores.) Essentially, we could compute latent scores for each group, then compute an independent t test on the
latent scores to test for a mean difference.

Figure 12.1 Final CFA Bifactor Confirmatory Model

The CFA multiple group analysis will be conducted using covariance matrices of elementary school children
on three measures of reading and three measures of mathematics from Raykov and Marcoulides (2008, p.
317). There are 230 girls and 215 boys in the data file, ch9ex4.dat, that contains both covariance matrices.
They also conveniently provided two separate data sets with the covariance matrices: ch9ex4-boys and ch9ex4-
girls. You can either access their website, download the zip file, and extract these files, or you can directly
enter them from the covariance matrices listed in the book. In either case, you will need to use the lower2full()
function to read the lower triangle matrices.

We first load the lavaan package, then read in the two covariance matrices using the lower2full() function.

364
The R commands are as follows:

We next assign variable names and declare the covariance matrices as a matrix, which is required for
computations in the functions. The R commands are as follows:

We once again use the model statement approach to indicate which observed variables are indicated on which
factors. The bifactor model is given the name, cfa2.model, with the read and math factor names, and the two
factors correlated. We will also create a file, fit.index, with the names of a few model fit indices. There are
many subjective fit indices, but these are the ones chosen for this example. The R commands are as follows:

365
We proceed with the multiple group analysis by first running each group separately to determine if girls fit the
CFA model, and whether boys fit the CFA model. Any differences in their individual model fit, for example,
one of the factor loadings, would require constraining that value to be different between the groups in the
multiple group analysis. The CFA model is run separately using the cfa() function, and fit indices are output
using the fitMeasures() function.

Note: The chi-square is nonsignificant (χ2 = 14.45, df = 8, p = .07), so we have a good data to model fit, that is,
the sample covariance matrix, girls, is not statistically different from the model-implied (reproduced)
covariance matrix. When χ2 = 0, then S = ∑, where S is the original covariance matrix and ∑ is the covariance
matrix implied by the measurement model.

366
Note: These results are similar to those reported by Raykov and Marcoulides (2008).

The chi-square is nonsignificant (χ2 = 13.21, df = 8, p = .105), so we have a good data to model fit, that is, the
sample covariance matrix, boys, is not statistically different from the model-implied (reproduced) covariance
matrix. The hypothesized bifactor measurement model reproduces 98% of the original covariance matrix (GFI
= 98%). The RA1, RA2, and RA3 observed variables indicate a read latent variable and the MA1, MA2, and
MA3 observed variables indicate a math latent variable. The read and math latent variables are significantly
correlated (z = 4.153, p < .0001).

367
Note: These results are similar to those reported by Raykov and Marcoulides (2008).

Chi-Square Difference Tests


The girls’ and boys’ CFA models both yield a chi-square value, which indicates the closeness of the S and ∑
matrices. A chi-square difference is used to test whether the two groups have similar or different CFA
measurement models. If the chi-square is statistically significant, the two groups have different CFA models.
We desire a nonsignificant chi-square value. The anova() function in the base stats package can be used to
compare fitted models.

368
The anova() function can test different types of fitted models. The test is called a chi-square difference test.
Basically, χ2girls 14.45 − χ2boys 13.21 = 1.24. This chi-square is not statistically significant, which indicates that
the girls and boys do not have statistically significant differences in the factor loadings and factor correlation
in the hypothesized bifactor measurement model. Since the girls’ and boys’ model chi-squares are not
different, the assumption of measurement invariance is met, they have similar reading and math ability.
Raykov and Marcoulides (2008) test other variations and comparisons, which are not further explored here.

Graphing Models
The lavaan.diagram() function in the psych package is used to diagram the girls’ and boys’ CFA models. The
standardized factor loadings are very similar, thus there is no difference in the CFA model between the girls
and the boys. The outputted graph with both girls’ and boys’ CFA models is shown in Figure 12.2.

Figure 12.2 Bifactor CFA of Girls and Boys

369
Structural Equation Models

370
Basic SEM Model
The structural equation model is when one or more independent latent variables predict one or more
dependent latent variables. The independent and dependent latent variables are created in the CFA
measurement models, and the researcher designates which ones are independent and which are dependent. It
is therefore important to have a good data to model fit in CFA prior to testing relations among the latent
variables in a structural equation model.

The independent latent variables are designated as ξ (ksi), while the dependent latent variables are designated
as η (eta). The basic SEM model with one independent latent variable and one dependent latent variable can
be diagrammed as seen in Figure 12.3. It is hypothesized that a person’s statistical imbalance is a predictor of
his or her statistical understanding. The researcher would hypothesize that statistical anxiety, attitude toward
statistics, and statistical stress are indicators of a person’s statistical imbalance, which would reflect a person’s
level of statistical understanding. Statistical understanding is indicated by statistical thinking, statistical
literacy, and statistical reasoning. A high statistical imbalance would be related to a low statistical
understanding, and vice versa. Therefore, the structure coefficient, designated as ✰, as shown in Figure 12.3,
would be hypothesized to be negative and statistically significant.

The data are a covariance matrix from Tabachnick and Fidell (2007, p. 686) that contains 5 observed variables
and 100 skiers. They ran their hypothesized model using MATLAB for matrix multiplications and EQS 6.1
for the structural equation model analyses. My results differ somewhat from their estimates based on their
using the raw data set and my using the truncated covariance matrix provided in the text. They hypothesized
that love of skiing (LOVESKI) was indicated by number of years skiing (NUMYRS) and number of days
skiing (DAYSKI), and that ski trip satisfaction (SKISAT) was indicated by satisfaction with the snow
(SNOWSAT) and satisfaction with the food (FOODSAT). In addition, degree of sensation seeking
(SENSEEK) directly predicted SKISAT. The hypothesized structural model was diagrammed as seen in
Figure 12.4:

Figure 12.3 Hypothetical Model: Statistical Imbalance Predicting Statistical Understanding

371
Figure 12.4 Hypothetical Structural Model

Source: Tabachnick and Fidell (2007, p. 687).

The asterisks, *, denote the factor loadings and structure coefficients that will be estimated for the model. I
begin by loading the lavaan package and inputting the covariance matrix. We once again use the lower2full()
function to output a square covariance matrix. The matrix contains row and column names, and covariance
data have been specified as a matrix.

The structural equation model is specified according to Figure 12.4 using the following commands:

The model is called a basic.model, and the first two lines contain the latent variables, loveski and skisat, with
their respective observed variables. The third line indicates that the latent variable, skisat, is predicted by
SENSEEK and loveski. Notice, you can have latent variables and observed predictor variables in a structural
equation model.

Only few model fit indices were output to compare with the results reported by the authors. The model fit
values were placed in a file using the fitMeasures() function. The R commands to compute the estimates for
the structural equation model and output the results are as follows:

372
The χ2 = 9.432 (df = 4, p = .051), compared with the χ2 = 9.337 (df = 4, p = .053) reported in their book. The
other fit measures were also close: cfi = .942 (.967); RMSEA = .117 (.116); and gfi = .965 (.965). The χ2 was
nonsignificant, therefore the sample covariance data fit the hypothesized structural model. The gfi = 97%, so
the hypothesized structural model reproduced 97% of the sample covariance matrix, which would leave small
residual values in the residual matrix (S−∑ = residual matrix).

The estimates for the structural model are given using the summary() function. An argument was included to
output the standardized values. The columns labeled std.lv and std.all provide the standardized estimates. The
boldfaced ones in the analysis output are close to those reported by Tabachnick and Fidell, 2007 (p. 694). The
standardized estimates are shown in Figure 12.5.

Figure 12.5 Hypothetical Structural Model

373
The major difference was that the structure coefficient for loveski predicting skisat was not statistically
significant (z = 1.73, p = .08) but was reported as statistically significant by Tabachnick and Fidell (2007)
using EQS 6.1 and the raw data set. This points out the danger that happens sometimes when reading in
truncated values from a correlation or covariance matrix. Results using at least five decimal places are needed
in most cases to obtain similar results.

Note: The commands to use the structure.diagram() function in the psych package requires specifying the factor loadings for x
variables (fx), factor loadings for y variables (fy), and the structure coefficient between the two latent variables (Phi). There is no
option for including an observed variable predicting fy. An example set of commands given in R for fx, fy, and Phi that produce the
diagram f1 would be as follows:

For our example, the basic commands for diagramming the model, minus SENSEEK predicting skisat, are given below. It is better
to use a good drawing package!

374
375
Longitudinal SEM Models
The longitudinal growth models (LGM) in SEM are similar to multivariate repeated measures, however,
latent variables and observed variables can be used in the models. LGM applications generally require large
sample sizes, multivariate normality, equal time intervals, and affective change resulting from the time
continuum (Note: applications today have demonstrated time varying intervals). In addition, LGM permits
the testing of several different model configurations ranging from intercept only (mean change across time) to
intercept and slope models (mean and rate of change across time) with error variance specifications. LGM
therefore permits testing many different models based on the constraints imposed in the model, for example,
linear versus quadratic or specifying equal error variances (Bollen & Curran, 2006; Duncan, Duncan, Strycker,
Li, & Alpert, 1999). The observed variable provides the intercept (means), and the covariance matrix provides
the slope (correlation for rate of change). Together, they provide the sufficient statistics required for LGM. I
present a basic growth model (intercept and slope), and a more advanced growth model (intercept and slope
model with predictor and covariate variables).

Basic Latent Growth Model


The intercept and slopes for the model are defined in a factor matrix (⋀) with the required coefficients for the
intercept and linear slope. The first column is a fixed loading for the intercept of each time period (Age11 to
Age15) that identifies a baseline reference. The second column contains the coefficients that designate a linear
slope. These coefficients could be changed if testing a quadratic slope, similar to how it is done in regression
analysis, or left free to vary (Raykov & Marcoulides, 2008; Schumacker & Pugh, 2013).
Λ=[1011121314]

The latent growth model (LGM) conceptually involves two different analyses: individual averages across time
(intercepts) and individual rate of change from the intercept (slope). The slope or rate of change can be
modeled as linear increases in growth or any other polynomial coefficient.

The basic LGM illustrates the intercept and slope for 168 adolescents over a 5-year period, Age 11 to Age 15,
on tolerance toward deviant behavior (Schumacker & Lomax, 2010). The latent growth curve model is
diagrammed in Figure 12.6 (Schumacker & Lomax, 2010, p. 342).

The following R commands load the lavaan package, read in the correlation matrix, and read in the means of
the observed variables (data were log transformed).

376
Figure 12.6 Latent Growth Model (Linear)

The LGM for linear growth is specified between single quote marks. The intercept is defined with an i, and
set equal to, = ~, the observed variables using the intercept coefficients (1s). The slope is defined with an s,
and also set equal to, = ~, the observed variables using the linear coefficients in the factor matrix (0 to 4). This
model would be considered an unconstrained model because no restrictions are placed on the residual error
variances.

The model is run using the growth() function in the lavaan package. The arguments for the function include
the name of the model statement, the sample covariance matrix, the number of observations, and the sample

377
means. The R commands are as follows:

The output indicated a poor model fit, χ2 = 39.079, df = 10, p < .0001, so another type of latent growth model
should be specified. Sometimes a model is specified that constrains the error covariances to be equal across the
time continuum.

A second model was run, this time constraining the residual variances to be equal across the time continuum.
The constraining of equal error variance is accomplished by setting the variables equal to themselves using
double tilde symbols (~~) and the built in r symbol. The modified model is specified as follows:

378
The output resulted in a worse model fit, χ2 = 70.215, df = 14, p < .0001, therefore the error variances need to
vary across time. The error variances of the observed variables were constrained to be equal, (r) .590, as seen in
the output. We need to specify another model structure that adequately explains the intercept (mean
difference) and slope (rate of change) in the adolescent tolerance of deviant behaviors.

379
The first LGM indicated a better model fit than the second LGM. In the first LGM, the variances from
Age11 to Age12 increased, whereas the variance decreased from Age14 to Age15. This signals a rationale for
specifying an intercept and slope model with these specific error covariances indicated for the pair of variables.
The third model tested therefore is specified with Age11 ~~ Age12 and Age14 ~~ Age15, which indicates to
correlate the error variances. The R commands are given as follows:

The output indicated a good model fit, χ2 = 7.274, df = 8, p = .507, so the data fit this LGM model. The
individual intercept and slope values (boldfaced) are given in Table 12.1. The intercept values decrease, then
increase at Age15. When we compare Age11 intercept (.766) with the final Age15 intercept (.754), the
intercept (mean) tolerance for deviant behavior has decreased. A linear intercept increase would ideally have

380
been a cleaner interpretation, however, the correlated error variances adjusted for this. The slope (rate) of
change indicated a linear increase from Age11 to Age15. The negative coefficient between intercept and slope
(− .091) was statistically significant, z = −2.412, p = .016. We can therefore interpret the decrease in tolerance
for deviant behavior to an increase in age. Adolescents who get older are less tolerant of deviant behavior.

381
Note: The covariances of Age11~~Age12 and Age14~Age15 were statistically significant, so they are
important in improving model fit.

There are other commands in R that provide additional output, which I have not listed here. Some of these R
commands are as follows:

382
Advanced Latent Growth Model
The basic LGM may not always answer the research question of interest. For example, there might be
predictor variables that explain the intercept and slope values and/or covariate variables that affect the
dependent latent variables’ change over time. A more advanced LGM is presented using data from Raykov
and Marcoulides (2008; ch13ex1_mcm.dat). The data set contains 400 high school students who were
measured on college aspiration and school motivation from 9th to 12th grade (4-year time period). College
aspiration was the dependent variable and school motivation a corresponding continuous covariate variable.
The parents’ dominance and encouragement for academic progress were measured at the start of the study and
constituted two predictor variables. Their original analysis included a latent variable for the two predictors;
however, I modeled them as separate predictor variables. My results will therefore differ slightly from their
reported results. Also, I did not conduct the many different constraints on the LGM model they presented in
their book.

The advanced LGM proceeds with loading the lavaan package, reading in the covariance matrix, and sample
means. Please note that we have continued to use the lower2full() function and must declare the data file as a
matrix, as.matrix() function, to permit computations (these are standard steps for lower triangle data matrices
and must be conducted prior to running the model). The R commands are as follows:

383
The advanced LGM is specified not only to include the intercept and slope but also to indicate the predictor
variables and the covariate variables. Figure 12.7 shows the relations in the advanced LGM. The intercept has
the constant coefficient to indicate the means of the four time periods (9th to 12th grade). The slope
coefficients indicate a linear growth. The covariate variables (MOTIVN1 to MOTIVN4) are indicated with
each corresponding college aspiration measure (COLLASP1 to COLLASP4). Finally, the two predictor
variables, PARSTYL1 and PARSTYL2, are shown predicting the intercept and slope values.

Figure 12.7 Advanced Latent Growth Model (Linear)

384
The R commands that reflect this advanced latent growth model are as follows:

The specified advanced latent growth model is run using the growth() function, which requires including the
specified model, sample covariance matrix, sample means, and the number of observations. The R command
is as follows:

The results are output using the summary() function.

The results indicated a close fit, χ2 = 35.785, df = 21, p = .023. The intercept values decrease over time, and
the slope values increase over time (see column Std.all). The second predictor, PARSTYL2, was not
statistically significant (z = .77, p = .44; z = − .234, p = .815). The model will therefore be rerun dropping this
predictor variable.

385
The model fit was still very close, χ2 = 33.983, df = 19, p = .02, and this time, all variables were statistically
significant. The PARSTYL1 regressions with the intercept and slope were statistically significant, and the
covariates were all statistically significant with the corresponding dependent latent variables. The intercept
and slope values were essentially the same as before. The slope estimate, .361, was positive and statistically

386
significant (z = 6.272, p < .001). This indicated that college aspiration increased from 9th to 12th grade in
correspondence with school aspiration and parental dominance. Unfortunately, parent encouragement for
academic progress did not provide a good prediction of the mean change in college aspiration across the
school years. Note: Raykov and Marcoulides (2008) provide additional models and discussion of their LGM.

387
388
Reporting and Interpreting Results
There are many different types of latent growth models, including mixed methods, nonlinear, covariates,
predictors, and multilevel designs. The multivariate statistics books referenced in the Preface and the
references in this chapter are a good source for a more in-depth discussion of these modeling types. A write-
up of the basic LGM will provide some guidance on the terminology for reporting and interpreting the
results.

The data contain 168 adolescents who were measured on their tolerance toward deviant behavior from Age 11
to Age 15. A latent growth model was tested for intercept and linear growth across the five age periods.
Preliminary findings indicated a linear increase in slope values, but an increase, then decrease in intercept
values. A final model was run, which included a constraint on the error variances from Age 11 to Age 12 and
again from Age 14 to Age 15. The final model had a good data to model fit, chi-square = 7.27, df = 8, p = .51.
The correlation estimate between the intercept and slope was statistically significant and negative (r = − .09, z
= −2.41, p = .02). The results therefore indicated that as adolescents got older, they became less tolerant of
deviant behavior.

389
Summary
The SEM approach involves two separate models: a CFA model and a structural equations model. The CFA
model hypothesizes and tests a measurement model where the observed variables are indicated on one or more
latent variables. The CFA model provides the estimates to compute the latent variable scores. The structural
equation model hypothesizes the relation among the latent variables and solves the set of simultaneous linear
equations. Today, there are many different SEM model applications, some of which we covered in this
chapter.

SEM applications have evolved to include models with continuous variables, ordinal variables (Jöreskog &
Moustaki, 2001), or both. The assumption of multivariate normality has been relaxed due to modern robust
estimation methods and the linear transformation of variables. The positive definite matrix, however, is a
requirement necessary to compute parameter estimates. The determinant of the matrix must be positive. Also,
the number of distinct values in the correlation or variance–covariance matrix must be greater than the
number of parameters estimated. This is referred to as the degrees of freedom, which must be equal to or
greater than 1 in value.

SEM modeling steps proceed along a logical progression from model specification, model estimation and
testing, model modification, and model diagramming. Specifying a model based on prior research and theory
is the more difficult aspect of SEM. There is no single SEM model, so selection of variables and theoretical
support for the relation among the variables are critical. The estimation of parameters depends on what
estimation method is chosen (least squares, generalized least squares, maximum likelihood, etc.). The model
testing is based on interpreting the chi-square statistic and related subjective indices. Sometimes the initial
model does not fit the data, so modification by adding or dropping paths is required. The researcher should
justify the modification changes to the model.

390
Exercises
1. Explain what is meant by a nonpositive definite correlation matrix.
2. Define determinant.
3. Define eigenvalue and eigenvector.
4. Explain the concept of reference scaling in SEM.
5. Run the following basic LGM with the hypothetical correlation matrix and variable means.

Use the lavaan package, lower2full(), and growth() functions. What are the results?

391
Web Resources
Books on Structural Equation Modeling

http://www2.gsu.edu/~mkteer/bookfaq.html#LawMax

Covariances Versus Pearson Correlations

http://www2.gsu.edu/~mkteer/covcorr.html

R psych Package

http://cran.r-project.org/web/packages/psych/vignettes/psych_for_sem.pdf

Website for list of R package content and types

http://cran.r-project.org/web/views/

Website to search for functions and references

http://ww.rseek.org

392
References
Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation perspective. New York, NY:
Wiley.

Cudeck, R. (1989). Analysis of correlation matrices using covariance structure models. Psychological Bulletin,
105, 317–327.

Cudeck, R., Jöreskog, K. G., Du Toit, S. H. C., & Sörbom, D. (2001). Structural equation modeling: Present
and future: A Festschrift in honor of Karl Jöreskog. Skokie, IL: Scientific Software International.

Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., & Alpert, A. (1999). An introduction to latent variable
growth curve modeling: Concepts, issues, and applications. Mahwah, NJ: Lawrence Erlbaum.

Fox, J. (2006). Structural equation modeling with the sem package in R. Structural Equation Modeling, 13,
465–486.

Hair, J. F., Jr., Hult, G. T. M., Ringle, C., & Sarstedt, M. (2014). A primer on partial least squares structural
equation modeling (PLS-SEM). Thousand Oaks, CA: Sage.

Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis.


Psychometrika, 34, 183–202.

Jöreskog, K. G. (1970). A general method for analysis of covariance structures. Biometrika, 57, 239–251.

Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three
approaches. Multivariate Behavioral Research, 36, 347–387.

Jöreskog, K. G., & Sörbom, D. (1979). Advances in factor analysis and structural equation models. New
York, NY: University Press of America.

Jöreskog, K. G., & Sörbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS
command language. Hillsdale, NJ: Lawrence Erlbaum.

LISREL (2014). LISREL User’s guide. Skokie, IL: Scientific Software International.

393
Mardia, K. V. (1970), Measures of multivariate skewness and kurtosis with applications. Biometrika, 57(3),
519–530.

Raykov, T., & Marcoulides, G. A. (2000). A method for comparing completely standardized solutions in
multiple groups. Structural Equation Modeling, 7, 292–308.

Raykov, T., & Marcoulides, G. A. (2008). An introduction to applied multivariate analysis. New York, NY:
Routledge (Taylor & Francis Group).

Revelle, W. (2014). psych: Procedures for personality and psychological research (R package Version 1.4.1).
Evanston, IL: Northwestern University.

Savalei, V., & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from
FIML. Structural Equation Modeling: A Multidisciplinary Journal, 19(3), 477–494.

Schumacker, R. E. (2014). Learning statistics using R. Thousand Oaks, CA: Sage.

Schumacker, R. E., & Lomax, R. G. (2010). A beginner’s guide to structural equation modeling (3rd ed.).
New York, NY: Routledge (Taylor & Francis Group).

Schumacker, R. E., & Marcoulides, G. A. (1998). Interaction and non-linear effects in structural equation
modeling (Eds.). Mahwah, NJ: Lawrence Erlbaum.

Schumacker, R. E., & Pugh, J. (2013). Identifying reading and math performance in school systems with
latent class longitudinal growth modeling. Journal of Educational Research and Policy Analysis, 13(3),
51–62.

Steiger, J. H. (1980a). Testing pattern hypotheses on correlation matrices: Alternative statistics and some
empirical results. Multivariate Behavioral Research, 15, 335–352.

Steiger, J. H. (1980b). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87,
245–251.

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). New York, NY: Pearson
Education.

394
Yves, R. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2),
1–36.

395
13 Statistical Tables

396
397
398
399
400
401
402
14 Chapter Answers

403
Chapter 1: Introduction and Overview

404
Chapter 2: Multivariate Statistics: Issues and Assumptions

405
Chapter 3: Hotelling’s T2: A Two-Group Multivariate Analysis

406
Chapter 4: Multivariate Analysis of Variance
1. 1. One-Way MANOVA

Results: The Basal, DRTA, and Strat groups differed on the joint means for the three posttest measures. The
joint means were Basal (17.76), DRTA (20.91), and Strat (20.13), so Basal is statistically different from the
other two groups.

2. Factorial MANOVA

407
3. List all data sets in R packages

408
Chapter 5: Multivariate Analysis of Covariance

409
Chapter 6: Multivariate Repeated Measures
1. The three assumptions to be met are independent observations, sphericity, multivariate normality.
2. Two advantages of multivariate repeated measures over paired t tests is controlling for Type I error rate,
so it has more power; and subjects are their own control, so requires fewer subjects.
3. Sphericity is when correlations among the repeated measures are too high. Sphericity requires that the
variance of the differences in pairs of repeated measures be equal.
4. Difference scores provide a control for sphericity. The test of parallelism or groups being similar across
time is conducted on the difference scores in a one-way MANOVA.
5. Given the following data set, ch5ex3.dat, conduct a multivariate repeated measures analysis using lmer4
package and lmer() function.

410
411
The multivariate repeated measures summary table provided the F values for the gender, time, and gender *
time effects. The gender * time effect is a test of parallelism—that is equal profiles between the groups. The F
value for the gender * time effect is nonsignificant (F = 1.6349, p = .20); therefore, we conclude that the groups
have parallel slopes. The F value for gender effect was not statistically significant (F = 2.2310, p = .14);
therefore, we conclude that males and females did not differ in their average induction reasoning. Finally, the
F value for time was statistically significant (F = 306.9423, p < .00001); therefore, we conclude that induction
reasoning was different across the four testing periods. We would report the means and standard deviations
using the basic R commands:

The induction reasoning means across time indicated that for the first three test periods, the means increased,
but in the last test period, the mean decreased. This would signify a nonlinear trend in the means. I reran the
model using nlmer() function for nonlinear mixed models and obtained the same results. I suspect that there
was no significant departure from linearity.

Note: The means in the describeBy() function above matched those in Raykov and Marcoulides (2008), but
the means for the time variable are slightly different from theirs (explains the slight difference in lmer analysis
results from theirs).

412
Chapter 7: Discriminant Analysis
1. (a) Mutually exclusive equal group sizes, (b) normality, (c) equal group variance–covariance, (d) no
outliers, and (e) no multicollinearity among independent variables.
2. MANOVA places group membership as independent variable with multiple continuous dependent
variables. Discriminant analysis places group membership as the dependent variable with multiple
continuous independent variables. The difference is that the dependent variables and independent
variables are located on the opposite side of the equation.
3. Conduct a discriminant analysis.
a. Find list of data files, attach file, list first 10 records

b. Print first 10 lines of data file

c. Run discriminant analysis

d. Output group prediction, put in data frame, view first 10 lines

413
e. Assess the accuracy of prediction—total percent correct

f. Show cell counts and proportions

g. Calculate chi-square for classification accuracy

h. Calculate effect size

414
i. Interpret results

The group membership variable, period, indicated three conditions: before warning sign, after warning sign,
and sometime later. Speed (speed) was measured at 14 different locations (pair) with one site having a warning
sign and the other no warning sign (warning variable). The study investigated whether speed and warning
variables could distinguish between the three conditions (period). Group sizes were equal. Group means
showed an increase from 37.36 (Period 1), 37.46 (Period 2), to 38.64 (Period 3). Classification accuracy was
36%, which was statistically significant (Pearson chi-square = 52.56, df = 4, p < .0001). The effect size was r2 =
.01, which is a small effect size but statistically significant (Bartlett chi-square = 54.29, df = 2, p < .0001).
Although these findings were statistically significant, a researcher should be cognizant of how large sample
sizes inflate the chi-square value.

    (sample size was 8,437).

Note: The amis data set is in the boot library. It contains 8,437 rows and 4 columns. The study was on the
effect of warning signs on speeding at 14 locations. The group variable, period, represents (1) before warning
sign, (2) shortly after warning sign, and (3) sometime later. The speed variable was in miles per hour; the
warning variable was (1) sign present and (2) no sign erected; and pair variable was a number from 1 to 14
that indicated the location. Detailed information is available at > help.search(“amis”).

415
Chapter 8: Canonical Correlation
1. A researcher should first screen his/her data to avoid issues related to multicollinearity, outliers, missing
data, and small sample sizes, which affect statistical analyses. The important assumptions in canonical
correlation analysis are normally distributed variables, linear continuous variables, and equal variances
among the variables. Failure to investigate and correct these data issues and assumptions can affect the
results.
2. Discriminant analysis has a single categorical dependent variable, while canonical correlation has
multiple linear continuous dependent variables. Discriminant analysis is focused on how well a set of
independent variables can predict group membership (dependent variable), while canonical correlation is
interested in how well two linear sets of variables are correlated. The two linear sets of variables form a
dimension and reflect latent variables.
3. Run several R functions to report the matrices, the canonical correlations, unstandardized loadings, plot
of the dimensions, F test of canonical variates, and the standardized canonical loadings.

416
417
Interpret Results
The canonical correlation analysis indicated that top movement and bottom movement of belly dancers were
statistically significantly related on two dimensions. The first canonical variate (dimension) had r = .91 (F =
5.62, df = 4, 8, p = .018). The first set of canonical loadings indicated that top circle (.68) and bottom circle
(.90) were opposite top shimmy (− .62) and bottom shimmy (− .48). The second canonical variate (dimension)
had r = .76 (F = 6.94, df = 1, 5, p = .046). The second set of canonical loadings indicated that top shimmy
(.79), top circle (.74), and bottom shimmy (.87) were mostly related, although bottom circle (.43) had a
positive weight. The effect sizes for the canonical variates were 83% (eigenvalue = .83), since canonical r1 =
.91, and 58% (eigenvalue = .58), since canonical r2 = .76, respectively. The two dimensions overlap, thus not
orthogonal. The plot indicates that belly dancers 3 and 6 were high on both dimensions, thus moving and
shaking both the top and bottom. Belly dancer 4 was high on the first dimension, so her movements were
mostly top and bottom circles.

Note: Interpretation of the belly dancers is directed toward whether they are high or low on the two
dimensions. In some cases, they are high on both dimensions or low on both dimensions. The clearer you can
be on what the dimensions represent, the clearer the interpretation.

418
419
Chapter 9: Exploratory Factor Analysis
1. (a) Correlations are not multicollinear (no singularity/identity matrix), (b) correlation matrix is not a
nonpositive definite matrix, (c) positive determinant of correlation matrix, (d) adequate sample size, and
(e) interitem correlations are positive (reliability).
2. Factor analysis reduces the number of variables into a smaller set of factors. The factors are identified by
the common shared variance among the variables. The contribution of each variable is identified by their
communality (h2). Principal components analysis determines components that provide weighting of the
observed variables. A component score is derived from the linear weighting of the observed variables.
3. The regression method has a mean = 0 and variance = h2 (commonality estimate). It results in the
highest correlation between factor and factor scores. Bartlett method has a mean = 0 and variance = h2
(same as regression method), but factor scores only correlate with their factor. Anderson–Rubin
produces factor scores with mean = 0 and standard deviation = 1. It results in factor scores that are
uncorrelated with each other.
4. EFA using Harman.8 data in psych package.

420
421
The EFA with nfactors = 3 displays two common factors and a unique factor.

The factor analysis with two common factors and a unique factor more clearly shows a two factor structure
indicated by the scree plot.

3. Report results

The 8 physical characteristics of the 305 women can be explained by two factors (constructs). Height, arm

422
span, forearm, and leg length measurements go together (share common variance) and are labeled, lankiness.
Weight, hip, chest girth, and chest width variables go together (share common variance) and are labeled
stockiness. Therefore lankiness and stockiness are two distinquishing characteristics of the 305 girls.

Note: We could output the factor scores on these two factors and create scaled scores from 0 to 100 to provide
a meaningful interpretation of the lankiness and stockiness constructs (traits).

423
Chapter 10: Principal Components Analysis
1. Principal components analysis is a data reduction method designed to explain variable variance in one or
more components. It computes eigenvalues that represent the distribution of variable variance across the
extracted principal components.
2. Determinant of a matrix is a measure of freedom to vary and indicates whether an inverse matrix is
possible to compute eigenvalues and eigenvectors.
3. Eigenvalue is a measure of generalized variance. In principal components analysis, it is the SS loading
for each extracted component. The sum of the eigenvalues will equal the sum of the variable variances.

Eigenvectors are the principal component weights used to compute the component scores. It is recommended
that the component scores be converted to scaled scores from 0 to 100 for meaningful interpretation.

4. The following R commands produce the summary output for answer.

The determinant of the matrix is positive (13273689529754), the Bartlett chi-square is statistically significant
(chi-square = 98.75, p < .001), and KMO (.76) is close to 1.0. These three assumptions indicated that it is
okay to proceed with principal components analysis (PCA).

The PCA was run with 5 components for the 7 variables. It indicated two eigenvalues > 1, PC1 (3.72) and
PC2 (1.14). This was confirmed by the scree plot. The two components extracted 53% (PC1) and 16% (PC2),
with the remaining variance spread across the three remaining components. Cronbach’s α = .84, which
indicates a high level of internal consistency of response.

PC1 comprises rating, complaints, privileges, learning, and raises based on component weights.

PC2 comprises critical and advance based on component weights.

PC1 is named job satisfaction.

PC2 is named negativity toward job.

Note: The sum of the eigenvalues (SS loadings) is equal to the sum of the variances in the diagonal of the
variance–covariance matrix.

424
425
426
Note: Each row is the component weights for the linear combination of a variable

427
Chapter 11: Multidimensional Scaling
1. The classical or metric MDS analysis enters exact distances in the proximity matrix—for example,
distances between cities. The nonmetric MDS analysis enters self-reported ordinal distances in the
proximity matrix—for example, responses to Likert-type scaled survey questions.
2. The direct method assigns a numerical value to indicate the distance between pairs of objects. The
indirect method uses data from subjects who rate pairs of objects to express their perception of similarity
or dissimilarity.
3. STRESS is a goodness of fit index with 0 indicating a perfect model fit. It is affected by the number of
dimensions expressed in the solution. A value greater than .20 is a poor model fit. It is a subjective
measure.
4. The amount of generalized variance explained by the MDS solution can be expressed as P2 or Mardia
criteria. P2 is the ratio of the sum of the eigenvalues over the total sum of the eigenvalues. Mardia
criteria squares the numerator and denominator of the P2 values. Both P2 and Mardia criteria are scaled
from 0 to 1, with values closer to 1.0 indicating a good fit.
5. The number of dimensions is a critical part of the MDS solution. Too few dimensions and the objects
are not distinguished, while too many dimensions would indicate every object as defining its own
dimension. The scree plot provides a good indication of the number of eigenvalues greater than 1.0 in
the proximity matrix. Dimensions with eigenvalues greater than 1.0 yield significant amounts of
explained variance.
6. Classical MDS analysis is conducted as follows:

428
429
430
The burt data set was input as a correlation matrix. The scree.plot() function used the burt data set to extract
and plot eigenvalues. The scree plot indicated three dimensions—that is, three eigenvalues greater than 1.0.
The classical (metric) MDS analysis used the cmdscale() function with a proximity matrix and two
dimensions. The proximity matrix was created using the dist() function. Results indicated that 75% of the
variance relation among the 11 emotional variables was explained (P2 and Mardia criteria = .75). A plot of the
two dimensions displayed a separation in the 11 emotional variables. The Shepard diagram indicated a fairly
stable monotonic increasing trend along a line of fit.

431
Note: Would the results be similar if we use a nonMetric MDS with the correlation to distance function in the
psych package, cor2dist() function?

432
Chapter 12: Structural Equation Modeling
1. A nonpositive definite matrix can occur for many reasons, but the basic explanation is that the matrix
values do not permit the calculation of parameter estimates. If a matrix has a determinant of zero, then
the inverse is zero, and division by zero is inadmissible. Similarly, if the eigenvalues of matrix are zero or
negative, then there is no generalized variance and no solution to the set of simultaneous equations.
2. The determinant of a covariance (correlation) matrix yields the generalized variance of the matrix. The
generalized variance takes into account the covariance, thus the determinant is the variance minus the
covariance. It is calculated by multiplying the row and columns of the covariance matrix by its cofactor
values and summing. The trace is the sum of the diagonal values in the matrix, whereas the determinant
is the variance–covariance.
3. Eigenvalues are the amount of variance for a specific set of eigenvector weights in a set of simultaneous
equations. For example, in factor analysis, more than one factor structure is possible—that is, subset of
variables. When a subset is given, each factor has variance (eigenvalue)—that is, sum of the factor
loadings squared (communality). The solution, however, is considered indeterminate, because other
solutions are possible—that is, other eigenvectors with corresponding eigenvalues. If the rank of a
matrix is 3, then there are three nonzero eigenvalues with associated eigenvectors.
4. Observed variables have a scale—that is, mean and standard deviation. A latent variable is created from
the observed variables without any scale (reference point). A latent variable by default is assigned a mean
= 0 and variance = 1. If an observed variable is assigned to the latent variable, generally by using the
value of 1, then the mean and the standard deviation of that observed variable are assigned to the latent
variable. The process of assigning the observed variable scale to the latent variable is referred to as
reference scaling.

433
The results indicate a good model fit, χ2 = 3.92, df = 5, p = .56. The intercept increase and a linear trend in
slope values are supported (see column Std.all). The intercept and slope are not correlated significantly.

Latent variables:

434
435
15 R Installation and Usage

436
Introduction to R
R is a free open-shareware software that can run on Unix, Windows, or Mac OS X computer operating
systems. Once the R software is installed, additional software packages or routines are available from an
extensive library. You first select a CRAN (Comprehensive R Archive Network) site near you, then use the
main menu to select Load Package. Knowing which R package to use will take some experience. Once an R
package is loaded, you access it by simply issuing the command, library(x), where x is the name of the
package. Let’s get started by downloading and installing the R software on your computer or laptop.

437
Download and Installing R
The R software can be downloaded from the CRAN, which is located at URL http://cran.r-project.org/.
There are several sites or servers around the world where the software can be downloaded from and is located
at http://cran.r-project.org/mirrors.html; this is referred to as CRAN mirror sites. The R version for
Windows will be used in class, so if using Linux or Mac OS X operating systems follow the instructions on
the CRAN website.

After entering the URL http://cran.r-project.org/ you should see the following screen.

After clicking on the “Download R for Windows”, the following screen should appear where you will click on
“base” to go to the next screen for further instructions.

After clicking on “base,” the following screen below should appear to download the Windows installer
executable file, for example, R-3.0.1-win.exe (The version of R available for download will change periodically
as updates become available, this is version R 3.0.1 for Windows).

Note: FAQ’s are available to answer questions about updating package etc. Click on the underlined Download
R 3.0.1 for Windows to begin installation.

You will be prompted to Run or Save the executable file, R-3.0.1-win.exe. Click on Run to install, or once the
file has been downloaded, simply double-click on the file name, R-3.0.1-win.exe, which will open the R for
Windows setup wizard below.

Note: The Download R 3.0.1 for Windows version will have changed to a newer version, so simply download
the latest version offered.

You will be prompted with several dialog box choices. Simply follow the instructions to complete the
installation. For example, the first dialog box will install core files, 32-bit files, and 64-bit files (uncheck the
64-bit box if your computer is not 64-bit compatible).

438
Getting Help
The R icon should appear on your desktop with the version number underneath. Click on this R icon to open
the R software. The following window should appear:

You can access additional R manuals, references, and material by issuing the following command in the RGui
window:

For example, click on “Packages” under the heading “Reference.” This will open a dialog box with a library
directory of R packages.

439
If we select the “base” package, another dialog box with related R functions will open. Now we can select
specific R functions that are listed A to Z in the documentation.

For example, if we select, abs, the specific R function and argument (x) required are displayed for obtaining
the absolute value of a number.

440
To illustrate using the RGui window, enter the R function for the number, −10, and the absolute value, a
positive 10, will be computed as follows:

441
Download and Install R
Precompiled binary distributions of the base system and contributed packages, Windows and Mac users most likely want one of
these versions of R:

Download R for Linux


Download R for MacOS X
Download R for Windows

R is part of many Linux distributions, you should check with your Linux package management system in addition to the link above.

442
R for Windows
Subdirectories:

Please do not submit binaries to CRAN. Package developers might want to contact Duncan Murdoch or Uwe Ligges directly in case
of questions/suggestions related to Windows binaries.

You may also want to read the R FAQ and R for Windows FAQ.

Note: CRAN does some checks on these binaries for viruses, but cannot give guarantees. Use the normal precautions with
downloaded executables.

R–3.0.1 for Windows (32/64 bit)

Download R 3.0.1 for Windows (52 megabytes, 32/64 bit)

Installation and other instructions

New features in this version

Online Documentation
A comprehensive Introduction to R is available online at

http://cran.r-project.org/doc/manuals/R-intro.html

The URL should open with the following heading and table of contents (abbreviated here). It covers
everything from A to Z that you may want or need to know if you choose to become more involved in using
R. It covers the basics: reading data files, writing functions, statistical models, graphical procedures, and
packages. etc.

443
Update R Software Version
I have found it very easy to update or install the latest version of R for Windows from the CRAN website.
You simply need to uninstall the older version of R. You do this by going to Start > Control Panel > Uninstall
Programs, then find the older version of R and click on it to uninstall. Now go back to the URL http://cran.r-
project.org/ and repeat the download instructions and run the latest Windows executable file. I have found
this to be the easiest and quickest way to update the R software version.

Note: Many of the R functions require a certain version of the R software, usually a newer version, and
generally, you will be notified when running an R function if it is not compatible.

Load, Install, Update R Packages


Once R is installed and the RGui window appears, you can load, install, or update packages and functions
that are not in the “base” package by using the main menu. Simply click on “Packages” in the main menu of
the RGui window, then make your selection, for example, “Load packages.”

444
A dialog box will appear that lists the base package along with an alphabetical list of other packages. I selected
the stats package from the list and clicked OK. This makes all of the routines or commands in the stats
package available. Prior to running any R commands in the RGui Console window, you will need to load the
package using the following command:

To obtain information about the R stats package, issue the following command in the RGui Console window:

445
This will provide a list of the functions in the stats package. An index of the statistical functions available in
the stats package will appear in a separate dialog box. The various functions are listed A to Z with a
description of each. You should become more familiar with selecting a package and using certain functions as
you navigate through the various statistical methods in the chapters of the book.

Running R Functions
To run R functions or commands, you will Click on File, then select New Script or Open Script from the main
menu in the RGui window. Create and save your script file or locate a R script file in your computer directory.

For example, Chap1.r script file will open in a separate R Editor window.

446
The R script file is run by first clicking on Edit in the pull-down menu and Select all to select all of the
command lines in the R script file. Next, click on the run icon (middle of main menu), and results will appear
in the RGui Console window. Optionally, click on Edit, then Run All. If syntax errors occur, they will appear
in the RGui Console window with little or no output provided. You can correct your errors in the R script
file, save the file, then rerun. The variable, total, specifies the values that will be read into the chap1()
function.

The chap1.r function computes basic summary statistics, combines them into a data frame, assigns names,
then prints the results.

You do not need to create functions, because functions are provided in the different R packages to perform
many of the statistical operations. For example, mean(), is a function that will return the mean of a continuous
variable.

I have italicized the names of R packages and boldfaced the names of the functions. Most functions require
specifying certain arguments, which indicates the operations to be performed. The information required in the
different functions used are explained in each chapter. The R packages, functions, data sets, and script files

447
used in the chapters are referenced in the appendix.

448
TIPS:
✓ Use help.start() to find R manuals and documentation
✓ Use main menu to load packages
✓ Use library(x) to load R package where x = name of package
✓ Use library(help=“stats”) to obtain a list of R functions in package
✓ Use Edit menu with Run All command to run R script files
✓ Use http://cran.r-project.org/doc/manuals/R-intro.html for Introduction to R

449
16 R Packages, Functions, Data Sets, and Script Files

450
451
452
Index

Note: All bolded entries denote functions.

A + HE-1, 58
A priori power estimation, 50–52
ach1, 63
ach2, 63
ade4 package, 235
Adjusted means of dependent variables, 87–93
Advanced latent growth model, 304–310
aes() function, 110, 117
Analysis of covariance. See ANCOVA
ANCOVA
assumptions, 83
description of, 82, 258
extraneous variables, 82
multivariate. See MANCOVA
Anderson, Theodore W., 100, 196
Anderson–Bahadur algorithm, 196
Anderson–Darling test, 14–15, 196
Anderson–Rubin test, 196–197
ANOVA, 82–83
anova() function, 74, 288–289
aov() function, 88
ape package, 235
Areas under the normal curve, 316
as.matrix() function, 305
Assumptions
ANCOVA, 83
ANOVA, 82–83
canonical correlation, 149–150
discriminant analysis, 133–134
exploratory factor analysis, 173–176
factor analysis, 173–176
Hotelling T2, 29–30
MANOVA, 58–66
multidimensional scaling, 232–234
multivariate repeated measures, 101

453
multivariate statistics, 12–23
principal components analysis, 209–210, 219–220
structural equation modeling, 258–263
attach() function, 105
Attitude Toward Research instrument, 201
Attitude Toward Science instrument, 208

Bartlett, Maurice S., 196


Bartlett method, 197
Bartlett test, 21, 174, 182, 209–210
Bartlett’s test statistic, 29
Bivariate covariance, 2
Box, George E. P. “Pel,” 10–11, 131
Box M test, 10, 21–23, 41, 54, 135–136, 139–141, 263
boxM() function, 22, 65, 135, 264
BT-1, 72
BW-1, 72

cancor() function, 142, 150–151


Canonical correlation
assumptions, 149–150
canonical variates, 148, 152, 161, 164–166
CCA package, 150, 152–157
dependent variables, 148
effect size, 165
example of, 158–165
F test for, 166
formula for, 148
independent variables, 148
interpreting of results, 165
missing data and, 149
multicollinearity and, 149
nonnormality issues, 149
outliers and, 149
overview of, 148–149
purpose of, 165
R packages, 150–158
reporting of results, 165
sample size and, 149
summary of, 166–167

454
yacca package, 158
Canonical correlation coefficient, 148
Canonical loadings, 163
cc() function, 153, 161
cca() function, 142, 158, 163
CCA package, 150, 152–157
Central limit theorem, 30
CFA. See Confirmatory factor analysis
cfa() function, 275–276, 285
chisq.test() function, 138
Chi-square difference tests, 288–289
Chi-square distribution for probability levels, 319–320
Chi-square test, xvi, 137–138, 142
Class method, 77
Class size, MANOVA, 77
Classification summary, 136–137, 141
cmdscale() function, 235, 238–239
Cochran, William Gemmell, 81–82
Cohen’s d, 52
Compound symmetry, 11, 112
Comprehensive R Archive Network, 5
comput() function, 162
Confirmatory factor analysis, 173, 257
Confirmatory factor analysis models
basic, 275–282
bifactor model, 274, 276, 283
chi-square difference tests, 288–289
graphing models, 289–290
multiple group model, 282–290
psych package, 274
Continuous dependent variable, 83
cor() function, 265
cor2dist() function, 235, 246
cor2pcor() function, 268
Correlation coefficient
description of, 2
factors that affect, 3–4
Pearson. See Pearson correlation coefficient
squared canonical, 148, 167
Correlation matrix

455
exploratory factor analysis, 178
principal components analysis, 178, 208, 227
structural equation modeling, 264–271
corr.p() function, 180, 218
cortest.bartlett() function, 181
cortest.mat() function, 267
cov() function, 20, 266
Covariate variables, MANCOVA, 84
cov2cor() function, 17, 20, 218, 266
cov.test() function, 61
Cox, Gertrude, 81
Cramér, Harold, 256
Cramer–von Mises test, 14–15
Cramér–Wold theorem, 263
Cronbach’s alpha, 176, 180, 182, 222

Darwin, Charles, 131


Data files
formatting of, xv
input, xv–xvii
Data sets
description of, xiv–xv
exploratory factor analysis, 179–180
principal components analysis, 216–218, 224
Dependent variables
adjusted means of, 87–93
continuous, 83
dichotomous, 134–138
multivariate repeated measures, 99
in multivariate statistics, 11
polytomous, 138–142
positive correlation, 41
describeBy() function, 19, 88, 107–108, 116
det() function, 16–17, 212
Determinant of a matrix, 16–18, 24, 175, 210, 262
Dichotomous dependent variable, 134–138
Difference scores, 112–114
Directionality of hypothesis, 50
Discriminant analysis
assumptions, 133–134

456
box M test, 135–136, 139–141
chi-square test, 137–138, 142
classification summary, 136–137, 141
dichotomous dependent variable, 134–138
effect size, 142–143
goal of, 143
interpreting of results, 143–144
overview of, 133
polytomous dependent variable, 138–142
reporting of results, 143–144
summary of, 144
dist() function, 238, 249
Doubly multivariate repeated measures, 114–126
Dunn–Bonferroni adjustment, 28, 106

ecodist package, 236


EFA. See Exploratory factor analysis
effect() function, 88
Effect size, 50
canonical correlation, 165
discriminant analysis, 142–143
Hotelling T2, 49–50, 52–53
MANOVA, 76–78
eigen() function, 241
eigenvals() function, 235
Eigenvalues, 58, 68, 72, 133, 165, 167, 175, 186–187, 210, 213–216, 221–223, 226–227, 235–237,
241–242
Equal variance–covariance matrices
description of, 18–21, 24, 32
MANOVA, 63–66
error.bars() function, 35
Estimation methods, 189
Eta-square, 53, 76, 79
Euclidean distance function, 231–232
Exploratory factor analysis
assumptions, 173–176
commonality, 185
correlation matrix in, 178
data set input, 179–180
description of, 173

457
example of, 178–201
factor loadings, 183–190
factor scores, 195–200
factors used in, 183–190
graphical display, 201
interpreting of results, 201
multidimensional scaling versus, 231
oblique factors, 190–195
orthogonal factors, 190–195
principal components analysis versus, 176–178
psych package for, 183
R packages, 178–179
reporting of results, 201
sample size adequacy for, 180–183
scree plot, 185–190
summary of, 202–203
Extraneous variables, 82
ezANOVA() function, 106

F distribution for probability levels, 321–322


F test
canonical correlation, 166
Hotelling T2 significance testing using, 32
reporting of, 76
sphericity assumption, 101
fa() function, 183
Factor analysis
assumptions, 173–176
confirmatory, 173
exploratory. See Exploratory factor analysis
multicollinearity, 175
ordinal, 173
overview of, 172–173
principal components analysis versus, 176–178
summary of, 202–203
types of, 173
factor() function, 105
Factor loadings, 183–190
Factor scores, 195–200
Factorial MANOVA, 70–75, 79

458
factor.minres() function, 183
factor.pa() function, 183
factor.wls() function, 183
fa.diagram() function, 201
fa.parallel() function, 186, 223
fa.poly() function, 183
file.choose() function, xvi
Fisher, Ronald Aylmer, 2, 27, 57, 81, 131–132, 147
fitMeasures() function, 285, 292
F.test.cca() function, 158, 163

Galton, Sir Frances, 171, 207


geom_line() function, 110, 117
ggplot() function, 109–110, 117, 119
glm() function, 74
Goodness-of-fit index, 231, 236–237
Gossett, W. S., 132
G*Power 3 software, 50–51, 54
Greenhouse–Geisser correction, 101, 107
Group variance–covariance matrix, 18
growth() function, 298, 307

Hartley F distribution for probability levels, 323


help() function, 273
Heywood cases, 11
hist() function, 200
Histogram, 200
Holzinger and Swineford covariance matrix, 273
Homoscedasticity, 83, 93
Hotelling, Harold, 27–28, 147
Hotelling T2
assumptions, 29–30
development of, 27–28
effect size, 49–50, 52–53
F test used with, 32
interpreting, 54
multivariate hypothesis, 30–32
overview of, 28–29
power, 49–52
practical examples using R, 33–49

459
reporting, 54
single sample, 33–36
summary of, 54–55
two independent group mean difference, 36–42
two paired dependent variable, 36–42
univariate hypothesis, 30–32
Hotelling T.2() function, 33
Hotelling–Lawley multivariate statistic, 68
Huynh–Feldt correction, 101, 107
Hypothesis
directionality of, 50
multivariate, 30–32
null, in MANCOVA, 85
univariate, 30–32

Ida() function, 135, 138, 143


Identity matrix, 210, 213, 262
Independent observations, 59–62
Independent variables, 11, 148
Indeterminacy, 189
Inferential statistics
factors that affect, 3
when not to use, 2–3
Input data files, xv–xvii
install.packages() function, 65
Interaction effect, 70
Internal consistency reliability, 182
Interpreting of results
canonical correlation, 165
discriminant analysis, 143–144
exploratory factor analysis, 201
Hotelling T2, 54
MANCOVA, 93–94
MANOVA, 78
multidimensional scaling, 251
multivariate repeated measures, 126–127
principal components analysis, 226–227
structural equation modeling, 310–311
Intraclass correlation, 59, 61
isoMDS() function, 235, 244, 246–247, 251

460
itemanal() function, 182

Jarque–Bera test, 13–14


Jöreskog, Dag, 270
Jöreskog, Karl Gustav, 256, 270

Kaiser–Meyer–Olkin test
description of, 174, 182
principal components analysis use of, 210, 219, 226
Kolmogorov–Smirnov test, 15
Kruskal, Joseph B., 230
Kurtosis, 174
Kurtotic data, 150

labdsv package, 236


lapply() function, 19
Latent growth model
advanced, 304–310
basic, 296–304
lavaan package, 266–267, 273–275
lavaan.diagram() function, 282, 289
Likert-type scale, 173
Linear discriminant equation, 133
lines() function, 243
LISREL, 256–257
lm() function, 74, 87
lme() function, 105–106
lmer() function, 121, 123–124
lmer4 package, 121–126
lower2full() function, 267, 275, 283, 292, 305

Mahalanobis D2, 53
MANCOVA
adjusted means of dependent variable, 87–93
covariate variables, 84
description of, 84, 258
example of, 85–87
interpreting of results, 93–94
propensity score matching, 94–97
reporting of results, 93–94
Mann, Henry, 27

461
MANOVA. See also Multivariate repeated measures
assumptions, 58–66
class size, 77
description of, 58
discriminant analysis and, 133
effect size, 76–78
equal variance–covariance matrices, 63–66
factorial design example of, 70–75, 79
independent observations, 59–62
interpreting, 78
normality, 62–63
one-way design, 66–70, 79
reporting, 78
structural equation modeling versus, 258
summary of, 79
manova() function, 67, 85, 87, 113, 117, 121
Mardia criteria, 236
Mardia test of multivariate normality, 259
Martin, E. M., 132
MASS package, 244
matchit() function, 94
matcor() function, 152, 160
Matrix
designation for, 16
determinant of, 16–18, 24, 175, 210, 262
multivariate normality of, 263
trace of, 16
variance–covariance. See Variance–covariance matrix
within variance–covariance, 18
Mauchly test of sphericity, 101
MDS. See Multidimensional scaling
Mean, 199
Mean square error, 83
Measurement invariance, 282
melt() function, 104
Mendel, Gregor, 131
metaMDS() function, 235
Minnesota Multiphasic Personality Inventory, 100
Missing data
canonical correlation and, 149

462
factor analysis and, 174
Pearson correlation coefficient affected by, 150
mnps() function, 94
modindices() function, 278
mshapiro.test() function, 260
Multicollinearity
canonical correlation, 149
description of, 11
factor analysis, 175
Multidimensional scaling
assumptions, 232–234
classic, 231–232, 237–244
dimensions used in, 234
direct method of, 233, 252
Euclidean distance function as metric in, 231–232
exploratory factor analysis versus, 231
goodness-of-fit index, 231, 236–237
indirect method of, 233, 252
interpreting of results, 251
Mardia criteria, 236
metric example of, 237–244
model, 233
nonmetric example of, 244–251
overview of, 231–232
P2 criteria, 236
principal components analysis versus, 231
proximities, 231, 233
proximity matrix, 233, 241
R packages, 234–236
reporting of results, 251
sample size, 233–234
scree plot, 237
Shepard diagram, 237, 242–243, 249–250
STRESS value in, 236, 246, 252
summary of, 252
variable scaling, 234
Multivariate analysis of covariance. See MANCOVA
Multivariate analysis of variance. See MANOVA
Multivariate hypothesis, 30–32
Multivariate normality, 258–261

463
Multivariate repeated measures
advantages of, 102–103
assumptions, 101
dependent variables, 99
doubly, 114–126
examples of, 103–126
interpreting of results, 126–127
overview of, 99–100
profile analysis, 108–114, 127
reporting of results, 126–127
research designs using, 99
scholars involved in, 99
single dependent variable, 103–108
sphericity concerns in, 101
summary of, 127
Multivariate statistics
assumptions that affect, 12–23
background of, 1
Box M test, 21–23, 41, 54
data screening for, 209
dependent methods of, 1
dependent variables, 11
determinant of a matrix, 16–18, 24
interdependent methods of, 1
issues associated with, 11–12
multicollinearity effects on results of, 11
normality tests, 12–16
univariate statistics versus, 11, 28
variance–covariance matrix equality, 18–21, 24
Multivariate t test
description of, 106
single-sample, 33–36
two groups (paired) dependent variable mean difference, 42–49
two independent group mean difference, 36–42
mvnormtest R package, 63

names() function, 275


na.omit() function, 245
Negative variance, 11
nlmer() function, 121

464
Normality
Anderson–Darling test, 14–15
Cramer–von Mises test, 14–15
factor analysis, 174
Jarque–Bera test, 13–14
Kolmogorov–Smirnov test, 15
MANOVA, 62–63
multivariate, 12–16, 258–261
Pearson chi-square test, 15
Shapiro–Francia test, 15
Shapiro–Wilk test, 12–14
summary of, 24
nortest R package, 62
Null hypothesis, in MANCOVA, 85

Oblique factors, 190–195


One-tailed test, 317
One-way MANOVA, 66–70, 79
Ordinal factor analysis, 173
Orthogonal factors, 190–195
Outliers, 149

P2 criteria, 236
paf() function, 180, 182, 219
par() function, 201
Parallelism, 109–111
Partial eta-square, 53, 79
Partial least squares modeling, 268
PCA. See Principal components analysis
pcor2cor() function, 269
Pearson, Karl, 2, 207–208
Pearson chi-square test, 15
Pearson correlation coefficient, 268
description of, 150
factors that affect, 3–4, 173–174
linearity and, 174
missing data and, 150, 174
normality and, 174
sample size and, 174
Perceptual mapping, 232

465
pf() function, 124
plot() function, 75, 186, 201
Polytomous dependent variable, 138–142
Population standard deviation, 50
Power
a priori power estimation, 50–52
factors that affect, 49–50
predict() function, 138
Principal components analysis
assumptions, 209–210, 219–220
Bartlett test of sphericity, 209–210
basics of, 211–216
correlation matrix, 178, 208, 227
data set for, 216–218, 224
determinant of a matrix, 210
example of, 216–226
exploratory factor analysis versus, 176–178
factor analysis versus, 176–178
identity matrix, 210, 213
interpreting of results, 226–227
Kaiser–Meyer–Olkin test, 210, 219, 226
loadings for, 227
multidimensional scaling versus, 231
overview of, 208–209
principal component scores, 215–216
R packages for, 216
reporting of results, 226–227
scree plot for, 222–226
summary of, 227
variance–covariance matrix, 208, 212
principal() function, 183, 220
print() function, 180
Profile analysis, 108–114, 127
Propensity score matching, 94–97
Proximities, 231, 233
psych package, 19, 59, 183, 244, 274
Pythagorean theorem, 175

Quick-R, 7

r distribution for probability levels, 318

466
R packages
canonical correlation, 150–158
data sets, 367–373
description of, xvii–xviii
downloading of, 355–356
exploratory factor analysis, 178–179
functions, 364–373. See also specific function
getting help for, 358–361
installation and usage of, 355–366
multidimensional scaling, 234–236
mvnormtest, 63
nortest, 62
online documentation, 361
principal components analysis, 216
psych, 19, 59, 244
script files, 367–373
structural equations modeling, 271–275
updating, 361–364
R software, 5–7
Random assignment, 82
Rao, C. R., 57–58, 132
RCommander, xviii, 6
read.csv() function, 159
read.table() function, 179
Reference scaling, in structural equations models, 270–271
Reference variables, 270–271
Regression method, 197
Reliability
description of, 176
internal consistency, 182
rep() function, 108
Repeated measures design
advantages of, 102–103
multivariate. See Multivariate repeated measures
Reporting of results
canonical correlation, 165
discriminant analysis, 143–144
exploratory factor analysis, 201
F test, 76
Hotelling T2, 54

467
MANCOVA, 93–94
MANOVA, 78
multidimensional scaling, 251
multivariate repeated measures, 126–127
principal components analysis, 226–227
structural equation modeling, 310–311
Roy multivariate statistic, 68
RStudio, xviii, 6–7
Rubin, Herman, 197
Russell, John, 132

Sample size
canonical correlation, 149
description of, 49
exploratory factor analysis, 180–183
factor analysis and, 174
multidimensional scaling, 233–234
Sampling adequacy
for exploratory factor analysis, 180–183
Kaiser–Meyer–Olkin test for, 174, 182, 210
test for, 174
Scatter plot, 174
Scree plot
for exploratory factor analysis, 185–190
multidimensional scaling, 237
for principal components analysis, 222–226
scree.plot() function, 237, 242, 248
SEM. See Structural equation modeling
sem() function, 274
sem package, 273–274
SensoMineR package, 235
sep() argument, xv–xvi
Separate group variance–covariance matrix, 18
setwd() function, 179
Shapiro, Samuel Sanford, 9
Shapiro–Francia test, 9, 15
shapiro.test() function, 63
Shapiro–Wilk test, 10, 12–14, 260, 263
Shepard, Roger N., 230
Shepard diagram, 237, 242–243, 249–250

468
Shepard() function, 243, 249
Single dependent variable multivariate repeated measures, 103–108
Single-sample multivariate t test, 33–36
smacof package, 235
Snedecor, George, 81
Spearman, Charles Edward, 171–172
Sphericity
assumption, 11, 101
Bartlett test of, 174, 209–210
Mauchly test of, 101
in multivariate repeated measures, 101
single dependent variable multivariate repeated measures, 106–108
time effect, 115
SPSS, 23
Squared canonical correlation coefficient, 148, 167
Standard deviation, 199
Statistical tables, 315–323
Statistics
assumptions, 3
factors affecting, 2–5
multivariate. See Multivariate statistics
STRESS value, 236, 246, 252
Structural equation model(s)
basic, 290–295
basic latent growth model, 296–304
definition of, 290
dependent latent variables, 290
description of, 257
independent latent variables, 290
longitudinal, 295–310
reference scaling in, 270–271
summary of, 311
Structural equation modeling
assumptions, 258–263
confirmatory factor analysis model, 257
correlation functions, 265–267
correlation matrix, 264–271
covariance functions, 265–267
covariance matrix, 264–271
equal variance–covariance matrices, 263–264

469
identity matrix, 262
interpreting of results, 310–311
lavaan package, 266–267, 273–275
longitudinal growth models in, 295–310
MANOVA versus, 258
matrix input functions, 267–270
multivariate normality, 258–261
Mx, 257
OpenMx, 257
overview of, 257
positive definite matrix, 261–263
R packages, 271–275
reporting of results, 310–311
sem package, 273–274
software packages, 257, 273–275
structural equation models. See Structural equation model(s)
summary of, 311
structure.diagram() function, 294
summary() function, 113, 142, 274–275, 277, 293, 307
summary.aov() function, 68
Systematic bias, 84

t distribution for probability levels, 317


t() function, 260
t test
multivariate. See Multivariate t test
univariate, 32
Thurstone method, 197
Torgerson, Warren S., 229–230
Trace of a matrix, 16
TukeyHSD() function, 75
Two groups (paired) dependent variable mean difference multivariate t test, 42–49
Two independent group mean difference multivariate t test, 36–42
Two-tailed test, 317
Type I error, 28, 49, 51
Type II error, 50

Unidimensional, 173, 185


Univariate hypothesis, 30–32
Univariate statistics, multivariate statistics versus, 11, 28
Univariate t-test, 32

470
Variance–covariance matrix
determinant of, 16
equal, 263–264
equality of, 18–21, 24, 32
principal components analysis, 208, 212
R commands for creating, 17
structural equation modeling, 263–264
vegdist() function, 237

Wald, Abraham, 28, 147


wcmdscale() function, 235
Web resources, 7
Wilk, Martin Bradbury, 10
Wilks A, 72
Wilk’s Lambda, 58, 133, 143
Wishart, John, 81
Within variance–covariance matrix, 18
Within-group SS, 84
Wold, Herman Ole Andreas, 255–256
WT-1, 72

yacca package, 158


Yates, Frank, 81

z scores, 316

471

You might also like