Fitting The Factor Analysis Model in Norm: Nickolay T. Trendafilov

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

19

The
British
Psychological
British Journal of Mathematical and Statistical Psychology (2005), 58, 19–31
q 2005 The British Psychological Society
Society

www.bpsjournals.co.uk

Fitting the factor analysis model in ‘1 norm

Nickolay T. Trendafilov*
University of the West of England, Bristol, UK

The well-known problem of fitting the exploratory factor analysis model is


reconsidered where the usual least squares goodness-of-fit function is replaced by a
more resistant discrepancy measure, based on a smooth approximation of the ‘1 norm.
Fitting the factor analysis model to the sample correlation matrix is a complex matrix
optimization problem which requires the structure preservation of the unknown
parameters (e.g. positive definiteness). The projected gradient approach is a natural way
of solving such data matching problems as especially designed to follow the geometry of
the model parameters. Two reparameterizations of the factor analysis model are
considered. The approach leads to globally convergent procedures for simultaneous
estimation of the factor analysis matrix parameters. Numerical examples illustrate the
algorithms and factor analysis solutions.

1. Introduction
This short paper is a sequel to Trendafilov (2003). It considers the problem of fitting the
well-known correlation structure implied by the factor analysis (FA) model,
R ZZ ¼ LL T þ C 2 ; ð1Þ
to a sample p £ p correlation matrix R (Bartholomew & Knott, 1999; Jöreskog, 1977;
Harman, 1976). Here L is a p £ q matrix of factor loadings and q is the number of
common factors. The loading matrix L is required to have full column rank. The p £ p
diagonal matrix C 2 must be non-negative definite. Some authors require C 2 to be
positive semi-definite (p.s.d.), but others find it unsatisfactory for interpretational
reasons and require C 2 to be strictly positive definite (p.d.) (see Anderson, 1984).
The FA fitting problem is to find the unknown pair {L, C} which gives the best fit
(for certain q) to the (sample) correlation matrix R of the data with respect to some
goodness-of-fit measure. The most frequently used goodness-of-fit measures are the
maximum likelihood (ML) and least squares (weighted and unweighted) for which a
number of iterative algorithms are available (Bartholomew & Knott, 1999; Jöreskog,
1977; Harman, 1976). In Trendafilov (2003) the FA fitting problem was solved for the ML

* Correspondence should be addressed to Nikolay T. Trendafilov, Faculty of Computing, Engineering and Mathematical
Sciences, University of the West of England, Bristol BS16 1QY, UK (e-mail: nickolay.trendafilov@uwe.ac.uk).

DOI:10.1348/000711005X47168
20 Nickolay T. Trendafilov

and least squares (both weighted and unweighted) goodness-of-fit measures, making use
of the projected gradient approach (Helmke & Moore, 1994).
In this paper the FA model (1) is fitted to the data R with respect to the ‘1
matrix norm:
X
kEk‘1 ¼ jE ij j;
i;j

where the sum of moduli of the errors is considered. This criterion in fact predates least
squares (see Chapter 1 in Rousseeuw & Leroy, 1987, for a comprehensive introduction
and further details). For brevity the problem of fitting the FA model (1) to the data R with
respect to the ‘1 matrix norm is referred to as ‘1 FA. Later the least squares FA will be
referred to as ‘2 FA.
The ‘1 norm goodness-of-fit criterion is usually associated with robust statistical
methods, for example regression analysis (Rousseeuw & Leroy, 1987). The obvious
reason for this is that the ‘1 norm goodness-of-fit criterion is less sensitive to large errors
than the ‘2 norm is. It is well known (Rousseeuw & Leroy, 1987) that for regression
problems the ‘1 norm goodness-of-fit criterion protects even from very large outliers if
they are in the dependent variables. Unfortunately, when the input data are in the form
of interrelationships – correlations for principal components and factor analysis, and/or
other proximity data for multidimensional scaling and cluster analysis – the influence of
the outliers in the raw data is multiplied through these interrelationships. Thus the
correlation coefficient is not a robust measure. In general, one may either look for robust
measures to overcome the outliers problem in the data for such multivariate techniques,
or look for alternative techniques working directly with the raw data. The former
approach to robust FA is adopted in Pison, Rousseeuw, Filzmoser, and Croux (2003),
where the standard FA algorithms (Bartholomew & Knott, 1999; Jöreskog, 1977;
Harman, 1976) are applied to some specific robust modification of the sample
correlation matrix and preliminary outlier detection and diagnosis are performed
(Rousseeuw & Van Driessen, 1999).
An important remark to be made here is that this paper is not concerned with the
robust FA as outlined above. This study is focused on the problem of finding a better fit
of the FA model (1) to a given sample correlation matrix R by taking more care of the
small deviations of the model to the data. While fitting the FA model to the correlation
matrix R one eventually tries to minimize deviations smaller than 1 in magnitude. When
the ‘2 goodness-of-fit measure is employed larger deviations (about 1 in magnitude) of
the model to the data have relatively large squares, while the small deviations (say,
smaller than 0.2) have negligible squares. Thus, ‘2 fitting tries harder to minimize those
large squares. This effect can be lessened if the ‘1 goodness-of-fit measure is employed
instead. This may be quite helpful when the FA model fits the correlation matrix R with
respect to the ‘2 norm and Heywood cases occur. Typically this results in at least one
diagonal entry of the fitted correlation matrix slightly greater than 1, that is, in a small
deviation from the corresponding entry of the sample correlation matrix. The ‘2
goodness-of-fit criterion does not bother itself with this small error, because it is
attempting to reduce other larger errors. The ‘1 goodness-of-fit criterion dampens this
effect and can result in a regular FA solution (not a Heywood case). Such a situation is
illustrated by an example in Section 4. This is a natural reason why the ‘1 fitting may be
advantageous in practice. At the same time one should realise that just dampening the
small deviation from the diagonal unit entry is not the only way that the application of
the ‘1 goodness-of-fit criterion influences the FA solution. Indeed, as is well known, the
Fitting the factor analysis model in ‘1 norm 21

maximum likelihood FA necessarily leads to FA solutions producing unit diagonal entries


in the fitted correlation matrix but which can still be a Heywood case.
To solve the ‘1 FA problem the projected gradient approach (Chu & Trendafilov,
2001; Trendafilov, 2003) is applied which is a powerful way to preserve the structure of
the model unknowns L and C. The unknown matrix L is required to be of full column
rank, which means it should belong to the non-compact Stiefel manifold of all p £ q
matrices with rank exactly q (Helmke & Moore, 1994). In the same manner as in
Trendafilov (2003), one can derive a dynamical system for L on this non-compact Stiefel
manifold. The problem with this approach is that the projection onto the non-compact
Stiefel manifold involves computation of the inverse matrix (L TL)21 which may not be
efficient and/or stable. To avoid this, a reparameterization of the FA model based on the
eigenvalue decomposition (EVD) of LL T in (1) was proposed in Trendafilov (2003). The
same reparameterization of (1) is employed in this paper.
It would be helpful briefly to recall some notation from Trendafilov (2003). Consider
the EVD of the p.s.d. matrix LL T of rank at most q in (1), that is, let LL T ¼ QD 2 Q T ;
where D 2 is a q £ q diagonal matrix composed of the largest (non-negative) q
eigenvalues of LL T arranged in descending order and Q is a p £ q orthonormal matrix
containing the corresponding eigenvectors. Then the FA model (1) can be rewritten as:
R ZZ ¼ QD 2 Q T þ C 2 : ð2Þ
Thus, instead of the pair {L, C}, a triple {Q, D, C} will be sought ( Trendafilov, 2003).
Note that these new unknown parameters are unique and correspond to the canonical
form of the factor solution introduced in Harman (1976, Section 8.7). In order to
maintain the FA constraints, the triple {Q, D, C} should be sought such that Q is a p £ q
orthonormal matrix of full column rank, D 2 D– ðqÞ and C 2 D– ð pÞ; D–(q) denoting
the set of all q £ q non-singular diagonal matrices. Thus the following feasible set is
defined:
Cþ ¼ Oð p; qÞ £ D– ðqÞ £ D– ð pÞ;
where O( p, q) is the (compact) Stiefel manifold of all p £ q orthonormal matrices of full
column rank, that is,
 
Oð p; qÞ :¼ Q 2 Rp£q : Q T Q ¼ I q :
Additionally, the following feasible set is considered:
C ¼ Oð p; qÞ £ DðqÞ £ Dð pÞ;
where the subspace of all q £ q diagonal matrices is denoted by D(q). In most practical
situations factor solutions with p.d. D 2 and C 2 can be achieved requiring the less
restrictive condition C. This modified problem, as a rule, is computationally easier.
In this reparameterization the ‘1 FA fitting problem is concerned with the following
constrained optimization problems:
minimize kR 2 QD 2 Q T 2 C 2 k‘1 ð3Þ
subject to ðQ; D; CÞ 2 Cþ or C: ð4Þ
Except for the construction considered above where L is decomposed as QD, there are
a number of ways to satisfy the constraint that L be of full column rank. Probably the
simplest possible way is if L is required to be a p £ q lower triangular matrix, with
22 Nickolay T. Trendafilov

a triangle of qðq 2 1Þ=2 zeros. Let L( p, q) denote the linear subspace of all such p £ q
lower triangular matrices. The application of the projected gradient approach requires
projection of the gradient of the objective function onto the constraint manifold. Note
that the tangent space of L( p, q) is L( p, q) itself. The projection of a general p £ q
matrix X onto L . is denoted by l(X) and is a p £ q lower triangular matrix, composed
of the elements of X with the upper triangle of qðq 2 1Þ=2 elements replaced by zeros.
For this new reparameterization, the ‘1 FA fitting problem is concerned with the
following constrained optimization problems:
minimize kR 2 LL T 2 C 2 k‘1 ð5Þ
subject to ðL; CÞ 2 Cþ or C; ð6Þ
where the constraint sets C þ and C should be redefined as follows:
Cþ ¼ Lð p; qÞ £ D– ð pÞ;
and
C ¼ Lð p; qÞ £ Dð pÞ:

2. Lower triangular reparameterization


Note that the objective function of the FA fitting problem (5) can be rewritten in the
following form:

kR 2 LL T 2 C 2 k‘1 ¼ trace½ðR 2 LL T 2 C 2 ÞT signðR 2 LL T 2 C 2 Þ : ð7Þ


The value of signðR 2 LL T 2 C 2 Þ can be approximated by tanhðgðR 2 LL T 2 C 2 ÞÞ for
some sufficiently large g, say g ¼ 1000: The gradient of this approximation to the right-
hand side of (7) can now be computed following the standard rules for matrix
differentiation (for convenience the objective function (7) is multiplied by .5); see
Magnus and Neudecker (1988). Denoting Y :¼ gðR 2 LL T 2 C 2 Þ; one has:
7L ¼ 2½tanhðY Þ þ Y ( cosh22 ðY Þ L; ð8Þ
and
7C ¼ 2½tanhðY Þ þ Y ( cosh22 ðY Þ ( C: ð9Þ
Thus the solution of the ‘1 FA fitting problem (5) and (6) is given by an initial value
problem (IVP) for the system of two matrix ordinary differential equations (ODEs):
dL  
¼ l ½tanhðY Þ þ Y ( cosh22 ðY Þ L ; ð10Þ
dt
and
dC
¼ ½tanhðY Þ þ Y ( cosh22 ðY Þ ( C a ; ð11Þ
dt
and some starting point for the flow. The power parameter a must be set to 1 if the
constraint C is applied, and to 3 if C þ is applied. Note that the term Y ( cosh22 ðY Þ
vanishes and tanh(Y ) becomes simply signðR 2 LL T 2 C 2 Þ outside a tiny neighbour-
hood of zero, whose diameter depends on the magnitude of g. Examples of numerical
solutions of the system (10) and (11) for ‘1 FA are given in Section 4.
Fitting the factor analysis model in ‘1 norm 23

For comparison, the ODEs for solving the ‘2 FA fitting problem for this particular
reparameterization are as follows:
dL
¼ l½ðR 2 LL T 2 C 2 ÞL ; ð12Þ
dt
and
dC
¼ ðR 2 LL T 2 C 2 Þ ( C a ; ð13Þ
dt
and some starting point for the flow.

3. EVD reparameterization
Here the EVD reparameterization of LL T in (1) is considered. As in Section 2, the
objective function of the FA fitting problem (3) can be rewritten in the following form:

kR 2 QD T Q T 2 C 2 k‘1 ¼ trace½ðR 2 QD T Q T 2 C 2 ÞT signðR 2 QD T Q T 2 C 2 Þ :


ð14Þ
The value of signðR 2 QD Q 2 C Þ can be approximated by tanhðgðR 2 QD Q T 2
T T 2 T

C 2 ÞÞ for some sufficiently large g, say g ¼ 1000: The solution of the ‘1 FA fitting
problem (3) and (4) for this particular reparameterization is given by an IVP for the
following system of three matrix ODEs, where for convenience the objective function
(14) is multiplied by .5. Denoting Y :¼ gðR 2 QD T Q T 2 C 2 Þ; one has:
dQ Q
¼ ½Q T ½tanhðY Þ þ Y ( cosh22 ðY Þ Q; D 2
dt 2 ð15Þ
T 22 2
þ ðI n 2 QQ Þ½tanhðY Þ þ Y ( cosh ðY Þ QD ;
dD
¼ Q T ½tanhðY Þ þ Y ( cosh22 ðY Þ Q ( D a ; ð16Þ
dt
and
dC
¼ ½tanhðY Þ þ Y ( cosh22 ðY Þ ( C a ; ð17Þ
dt
and some starting point for the flow. The Lie bracket notation ½X ; Y ¼ XY 2 YX is
used in (15). The power parameter a must be set to 1 if the constraint C is applied, and
to 3 if C þ is applied. Examples of numerical solutions of the system (15) –(17) for ‘1 FA
are given in Section 4.
For comparison the reader is referred to the ODEs (34) – (36) in Trendafilov (2003)
derived for solving the ‘2 FA fitting problem for this particular reparameterization.

4. Numerical experiment
In this section, some numerical experiments with equations (10) and (11), (12) –(13)
and (15) –(17) are reported. The computations are carried out by MATLAB 5.3 on a PC
under Windows NT. The numerical integrator ode15s from the MATLAB ODE suite
(Shampine & Reichelt, 1997) is applied for the IVPs. The code ode15s is a quasi-
constant step size implementation of the Klopfenstein – Shampine family of numerical
differential formulae for stiff systems.
24 Nickolay T. Trendafilov

In our experiments, the tolerance for absolute error is set at 1029 and for relative
error at 1025. This criterion is used to control the accuracy in following the solution
path. The integration terminates automatically when the relative improvement of the
objective function between two consecutive output points is less than 1027, indicating
that a local minimizer has been found. This stopping criterion can be modified if
necessary. In all experiments g ¼ 1000:
In the following numerical experiments initial (starting) values for solving the ODEs
for the corresponding ‘1 FA formulations from Sections 2 and 3 are required. As in any
other method requiring an initial value, the starting guess for the solution can be crucial
for the performance of the algorithm. In general, there are two types of initial values:
rational and random. Usually a good rational start at least saves CPU time. Nevertheless it
is always advisable to run the algorithm with some random starts as well.
In the numerical experiments the following rational initial starts were used. Initial
values for solving the ODEs (10) – (11) and (12) –(13) are taken as follows. Compute the
Cholesky factorization of R, say R ¼ LL T : For some given q, form the p £ q matrix Lq
containing the first q columns of L, which istaken as an initial value L for the L-flow.
01=2
The initial value for C is calculated as C 0 ¼ diag R 2 L0 LT0 þ :005I n or is simply
taken to be a diagonal matrix with diagonal entries uniformly distributed in (0, 1).
Initial values for solving the ODEs (15) –(17) are taken as follows (Trendafilov, 2003).
Compute the EVD of R, say R ¼ QD 2 Q T : For some given q, form the q £ q diagonal
matrix D 2q containing the largest q eigenvalues only. Form the p £ q matrix Qq
containing the corresponding q eigenvectors. Note that Q 0 D 20 Q T0 is the best ‘2
approximation
  of R of rank q. The initial value for C is given simply by C 0 ¼
1=2
diag R 2 Q 0 D 20 Q T0 :
The projected gradient algorithms are illustrated by solving the ‘1 FA fitting problem
for the classical FA example with five socio-economic variables (Harman, 1976, Table
2.1). These data are analysed by a number of different methods and numerical
techniques in Harman (1976) – ML, MINRES, principal FA, etc. Contemporary statistical
software solutions for these data are available in SAS (1990, pp. 815 –882). In
Trendafilov (2003) this data set was fitted by the FA model (1) in terms of the ML and ‘2
goodness-of-fit functions, making use of the projected gradient approach.
First, the ‘1 FA fitting problem is solved as defined according to the lower triangular
reparameterization in which the ODEs (10) and (11) are integrated numerically. In
Table 1 the one-, two-, and three-factor solutions of this problem constrained by C 2 $ 0
are reported. In Table 2 the solutions of the same problem constrained by C 2 . 0 are
reported.

Table 1. ‘1 solutions with C for lower triangular reparameterization

1-factor 2-factor 3-factor


solution solution solution
fit ¼ 1:82680282 fit ¼ 0:05478626 fit ¼ 0:00000000

L C2 L C2 L C2

POP 0.0228 0.9995 0.9912 20.0000 0.0176 0.9908 20.0000 0.0000 0.0184
SCHOOL 0.8760 0.2326 0.0104 0.8786 0.2279 0.0098 0.9454 20.0000 0.1061
EMPLOY 0.1760 0.9690 0.9812 0.1026 0.0268 0.9816 0.1530 0.1096 0.0011
SERVICES 0.7894 0.3768 0.4428 0.7818 0.1927 0.4430 0.7267 20.2860 0.1938
HOUSE 0.9851 0.0296 0.0221 0.9821 0.0350 0.0226 0.9127 20.3647 0.0335
Fitting the factor analysis model in ‘1 norm 25

Table 2. ‘1 solutions with C þ for lower triangular reparameterization

1-factor 2-factor 3-factor


solution solution solution
fit ¼ 1:82680282 fit ¼ 0:05478626 fit ¼ 0:00000000

L C2 L C2 L C2

POP 0.0228 0.9995 0.9912 2 0.0000 0.0176 0.9893 0.0000 0.0000 0.0213
SCHOOL 0.8760 0.2326 0.0104 0.8786 0.2279 0.0098 0.9974 0.0000 0.0052
EMPLOY 0.1760 0.9690 0.9812 0.1026 0.0268 0.9830 0.1450 0.0567 0.0094
SERVICES 0.7894 0.3768 0.4428 0.7818 0.1927 0.4437 0.6889 20.3765 0.1869
HOUSE 0.9851 0.0296 0.0221 0.9821 0.0350 0.0226 0.8652 20.4558 0.0433

The one- and two-factor solutions obtained subject to the constraints C and C þ
respectively are identical and regular. Note that the loadings for the (regular) one-factor
solutions are similar to the (regular) ML one-factor solutions for this problem as reported
in Trendafilov (2003). The ‘classical’ ML one-factor solution reported in SAS (1990,
Output 26.3.2) is quite different from the current one and is a Heywood case.
Note also that the regular two-factor solutions from Tables 1 and 2 are similar to
those of the Heywood two-factor ML solutions for this problem as reported in Harman
(1976, Table 10.5) and SAS (1990, Output 26.3.2).
The three-factor solutions differ slightly. The solution constrained by C may be
considered as very close to a Heywood case by some authors, while the corresponding
solution constrained to C þ is clearly regular.
For comparison the solutions of the ‘2 FA fitting problems for this particular
reparameterization (12) and (13) are reported here. They are given in Table 3 for the
constraint C 2 $ 0; and in Table 4 for the constraint C 2 . 0 :

Table 3. ‘2 solutions with C for lower triangular reparameterization

1-factor 2-factor 3-factor


solution solution solution
fit ¼ 0:96475892 fit ¼ 0:00098222 fit ¼ 0:00000000

L C2 L C2 L C2

POP 0.4006 0.8396 1.0015 2 0.0000 0.0000 0.9902 0.0000 0.0000 0.0195
SCHOOL 0.7191 0.4829 0.0265 0.8745 0.2346 0.0098 0.9515 20.0000 0.0945
EMPLOY 0.4893 0.7606 0.9695 0.1259 0.0442 0.9821 0.1521 0.1015 0.0020
SERVICES 1.0054 0.0000 0.4340 0.7802 0.2029 0.4432 0.7221 20.2996 0.1924
HOUSE 0.7631 0.4177 0.0110 0.9872 0.0253 0.0226 0.9068 20.3764 0.0354

The one- and two-factor solutions obtained subject to the constraints C and C þ are
almost identical. The one-factor solutions differ considerably from the corresponding
‘1 solutions from Tables 1 and 2. Both the one- and two-factor solutions are Heywood
cases. Those obtained subject to the constraint C þ formally satisfy C 2 . 0 ( Table 4)
but for practical reasons may not be considered convincing. These two-factor solutions
are also similar to the Heywood ML solutions of this problem as reported in Harman
(1976, Table 10.5) and SAS (1990, Output 26.3.2). The three-factor solutions are very
26 Nickolay T. Trendafilov

Table 4. ‘2 solutions with C þ for lower triangular reparameterization

1-factor 2-factor 3-factor


solution solution solution
fit ¼ 0:96476055 fit ¼ 0:00098429 fit ¼ 0:00000000

L C2 L C2 L C2

POP 0.4006 0.8396 1.0012 0.0000 0.0007 0.9893 20.0000 20.0000 0.0212
SCHOOL 0.7191 0.4829 0.0266 0.8745 0.2346 0.0098 0.9974 20.0000 0.0052
EMPLOY 0.4893 0.7606 0.9698 0.1258 0.0437 0.9830 0.1450 0.0566 0.0095
SERVICES 1.0053 0.0002 0.4341 0.7801 0.2029 0.4437 0.6889 20.3766 0.1868
HOUSE 0.7631 0.4177 0.0110 0.9872 0.0252 0.0226 0.8652 20.4556 0.0434

close to the corresponding ‘1 solutions reported above in Tables 1 and 2, respectively.


They are both regular, especially the solution obtained subject to C þ .
Next the ‘1 FA fitting problem as defined according the EVD reparameterization is
solved. The ODEs (15) and (17) are integrated numerically. In Table 5 the one-, two-, and
three-factor solutions of this problem constrained by C 2 $ 0 are reported. In Table 6
the solutions of the same ‘1 FA fitting problems (15) – (17) constrained by C 2 . 0 are
reported.

Table 5. ‘1 solutions with C for EVD reparameterization

1-factor 2-factor 3-factor


solution solution solution
fit ¼ 1:82680894 fit ¼ 0:05478627 fit ¼ 0:00000000

QD C2 QD C2 QD C2

POP 0.0228 0.9995 0.6186 20.7744 0.0176 0.5909 20.7876 20.0062 0.0305
SCHOOL 0.8760 0.2326 0.6929 0.5403 0.2279 0.7712 0.5640 0.2949 0.0002
EMPLOY 0.1760 0.9690 0.6926 20.7025 0.0268 0.6894 20.7183 0.0939 0.0000
SERVICES 0.7894 0.3768 0.8872 0.1420 0.1927 0.8896 0.1117 20.1956 0.1579
HOUSE 0.9851 0.0296 0.7811 0.5957 0.0350 0.7722 0.5520 20.1483 0.0770

Table 6. ‘1 solutions with C þ for EVD reparameterization

1-factor 2-factor 3-factor


solution solution solution
fit ¼ 1:82680494 fit ¼ 0:05478628 fit ¼ 0:00000003

QD C2 QD C2 QD C2

POP 0.0228 0.9995 0.6186 20.7744 0.0176 0.5915 20.7896 20.0104 0.0266
SCHOOL 0.8760 0.2326 0.6929 0.5403 0.2279 0.7699 0.5606 0.2928 0.0073
EMPLOY 0.1760 0.9690 0.6926 20.7025 0.0268 0.6884 20.7172 0.0902 0.0036
SERVICES 0.7894 0.3768 0.8872 0.1420 0.1927 0.8859 0.1101 20.1787 0.1712
HOUSE 0.9851 0.0296 0.7811 0.5957 0.0350 0.7768 0.5556 20.1584 0.0628

The one- and two-factor solutions obtained subject to the constraints C and C þ are
identical and regular. The one-factor solutions are identical to those found with the
lower triangular reparameterization from Tables 1 and 2.
Fitting the factor analysis model in ‘1 norm 27

The loadings of the two-factor solutions differ from those found with the lower
triangular reparameterization (Tables 1 and 2). This is as expected and reflects the
different reparameterizations of the FA model and the corresponding starting values. It is
important to note that the individual factors C 2 for the one- and two-factor solutions are
identical for both of the reparameterizations.
Note that the two-factor solution (Tables 5 and 6) is similar to the canonical form of
the Heywood ML solution of this problem as reported in Harman (1976, Table 10.5).
The loadings of the three-factor solutions in Tables 5 and 6 are close but only the
solution obtained subject to the constraint C þ is regular.
For comparison the solutions of the ‘2 FA problems for this particular
reparameterization are reported here. The ODEs (34) –(36) from Trendafilov (2003)
are integrated numerically. They are given in Table 7 for the constraint C 2 $ 0 and in
Table 8 for the constraint C 2 . 0 :

Table 7. ‘2 solutions with C for EVD reparameterization

1-factor 2-factor 3-factor


solution solution solution
fit ¼ 0:96475898 fit ¼ 0:00098255 fit ¼ 0:00000412

QD C2 QD C2 QD C2

POP 0.4006 0.8396 0.6222 20.7847 0.0001 0.5923 20.7854 0.0030 0.0322
SCHOOL 0.7191 0.4829 0.7017 0.5225 0.2346 0.7687 0.5680 0.2921 0.0013
EMPLOY 0.4893 0.7606 0.7010 20.6815 0.0441 0.6909 20.7163 0.1013 0.0000
SERVICES 1.0054 0.0000 0.8810 0.1446 0.2029 0.9000 0.1180 20.2305 0.1229
HOUSE 0.7631 0.4177 0.7804 0.6048 0.0253 0.7624 0.5473 20.1166 0.1056

Table 8. ‘2 solutions with C þ for EVD reparameterization

1-factor 2-factor 3-factor


solution solution solution
fit ¼ 0:96480572 fit ¼ 0:00101835 fit ¼ 0:00006250

QD C2 QD C2 QD C2

POP 0.4006 0.8396 0.6202 20.7804 0.0098 0.5968 20.7883 0.0267 0.0194
SCHOOL 0.7195 0.4823 0.7018 0.5231 0.2338 0.7601 0.5708 0.2919 0.0121
EMPLOY 0.4893 0.7606 0.7036 20.6852 0.0346 0.6898 20.7092 0.1128 0.0082
SERVICES 1.0040 0.0041 0.8810 0.1450 0.2029 0.9272 0.1316 20.3018 0.0315
HOUSE 0.7636 0.4170 0.7796 0.6045 0.0275 0.7467 0.5408 20.0479 0.1477

The one- and two-factor solutions obtained subject to the constraints C and C þ are
quite close. The corresponding solutions in Table 7 are Heywood cases, while those in
Table 8 are regular. The one-factor C solution is identical to the corresponding ‘1 solution
from Table 5. The two-factor ‘2 solutions are similar to the canonical Heywood ML
solution of this problem reported in Harman (1976, Table 10.5), but the C þ solution is no
longer a Heywood case. Note also that this canonical form solution ( Harman, 1976,
Table 10.5) is almost identical to the ML solutions reported in Trendafilov (2003). The
three-factor ‘2 solutions are relatively close to the corresponding ‘1 solutions reported
28 Nickolay T. Trendafilov

above in Tables 5 and 6. The solution obtained subject to the constraint C is a Heywood
case, while the C þ solution is ‘sufficiently’ regular. Note that the ‘classical’ three-factor
ML solution in SAS (1990, Output 26.3.3) is a Heywood case. A regular three-factor ML
solution was found in Trendafilov (2003) for this problem which is surprisingly close to
the three-factor ‘1 solution in Table 6.
To summarize, the two- and three-factor ‘1 and ‘2 solutions are quite similar within a
certain reparameterization. The solutions found are the same whether the algorithms
are run with random or rational starts. All of the ‘1 solutions are regular.
According to the classical factor analyses of the five socio-economic variables
(Harman, 1976, Table 2.1) the best FA model for these data is with two factors (Harman,
1976; SAS, 1990). The loadings of the two-factor ‘1 solutions (e.g. Tables 2 and 6) are
different for the different FA reparameterizations. In Fig. 1 they are depicted as points in
the plane; the dotted line denotes the solution with EVD reparameterization (Table 6).
Obviously, the ‘shapes’ of the solutions are identical. The EVD solution can be
rotated to match the one with the lower triangular reparameterization and vice versa.
Thus these two FA reparameterizations lead to a single solution. Indeed, this can be
shown formally by applying the standard Procrustes technique. To see this a scaling
constant s, a 2 £ 1 translation vector t and a 2 £ 2 orthogonal matrix are sought such
that

sQDU þ 1 5 t T ; ð18Þ

is as close (in the Frobenius norm) as possible to L from Table 6, where 15 is a 5 £ 1


vector of ones. One can easily find that the required rotation U in (18) is given by the
orthogonal matrix
" #
0:6241 0:7813
U ¼ :
20:7813 0:6241

The rest of the required parameters are s ¼ 1 and t ¼ ð0:3476 £ 1024 ; 20:0453 £
1024 ÞT :

Figure 1. Two two-factor ‘1 solutions: EVD solution, Table 6 (dotted lines); lower triangular
reparameterization solution, Table 2 (solid lines).
Fitting the factor analysis model in ‘1 norm 29

It is interesting to check what kind of a simple structure solution can be found by,
say, varimax rotation of the EVD solution. The varimax rotation matrix Uvarimax and the
corresponding varimax simple structure solution B varimax ð¼ AU varimax Þ are as follows:
2 0:9910 0:0169 3
6 20:0047 0:8786 7
6 7 " #
6 7 0:6107 0:7919
6 0:1194 7
B varimax ¼ 6 0:9793 7 and U varimax ¼ :
6 7 20:7919 0:6107
6 0:4294 0:7893 7
4 5
0:0053 0:9823

Obviously, the varimax rotated solution Bvarimax is quite similar to the solution obtained
by the lower triangular reparameterization in Table 2. Their graphical representations
nearly coincide.
The one-factor solutions of the ‘1 FA differ considerably from the corresponding one-
factor ‘2 solutions. All of the one-factor ‘2 solutions are found by starting the algorithms
with both rational and random initial values. The situation with the one-factor ‘1
solutions is more complicated. The reported ‘1 solutions (identical for different FA
reparameterizations) are found by starting the algorithms with random initial values. It
turns out that the algorithms converge to three local minima. The lowest of them (and
the most frequently hit) is believed to be the global minimum and is reported in the
tables. The remaining two local minima are given in Table 9. It is surprising to note that
the worst local minimum (with the worst ‘1 fit) is reached starting the algorithms with
the rational start. Note also that this worst ‘1 local minimum is very similar to the
‘classical’ ML solution (Heywood case), e.g. SAS (1990, Output 26.3.1). The other local
minimum is close to the one-factor ‘2 solutions (e.g. Table 3) and is a Heywood case as
well. These three ‘1 one-factor solutions are depicted in Fig. 2.

Table 9. Two Heywood one-factor ‘1 solutions for five-socio-economic variables (Harman, 1976,
Table 2.1)

‘1 1-factor ‘1 1-factor
solution solution
fit ¼ 2:15570432 fit ¼ 2:46370857

L C2 L C2

POP 0.4128 0.8296 0.9259 0.1427


SCHOOL 0.6509 0.5763 0.1468 0.9784
EMPLOY 0.4841 0.7656 1.0499 0.0000
SERVICES 1.0621 0.0000 0.4899 0.7600
HOUSE 0.7321 0.4641 0.1160 0.9866

The two-factor solution is commonly considered as the most appropriate for this data
set. However, from Figure 1 one can see that the ‘shape’ of the different two-factor
solutions is quite flat. This implies that an appropriate one-dimensional projection of
the two-factor solution can provide nearly the same information about the variables of
the problem as the two-factor solution does. Suppose the two-dimensional points of the
variables are projected onto the line passing through POP and SCH. Then the projected
30 Nickolay T. Trendafilov

Figure 2. Plot of three one-factor ‘1 solutions obtained with different initial values for five socio-
economic variables (Harman, 1976, Table 2.1). The solid line depicts the proper solution from Table 2,
the dotted line the solution with the worst fit of 2.46 from Table 9.

variables will have a structure quite similar to the one-factor ‘1 solution from Table 2.
In fact it is sufficient to note that the projections and the variables in the one-factor
‘1 solution from Table 2 have the same order according to their magnitudes: POP, EMP,
SER, SCH and HOU (see also Fig. 1). As was mentioned before, the ‘classical’ two-factor
ML solutions for this problem (SAS, 1990, Output 26.3.2) are quite similar to the ‘1
results in Tables 1 and 2 above and thus have the same order of projections. The
variables in the corresponding ‘classical’ one-factor ML solution (SAS, 1990, Output
26.3.1) have a different order: HOU, SCH, SER, POP and EMP, that is, it cannot be
obtained from the ‘classical’ two-factor solution by simply projecting the two-factor
solution onto a one-dimensional factor space. This seems to be another indication that
the ‘classical’ one-factor ML solution is invalid, in addition to the fact that it is a Heywood
case. In a similar way one can check that the local minima reported in Table 9 cannot be
obtained from the two-factor solution (Table 2) which indicates they are invalid factor
solutions additionally to the fact that they are both Heywood cases.

5. Discussion and concluding remarks


In this paper, the projected gradient approach is applied to solve a more resistant
version of the FA fitting problem. The standard ‘2 goodness-of-fit function is replaced by
a smooth approximation of the ‘1 matrix norm. The proposed reformulation of the FA
fitting problem is less sensitive to large data to model deviations and may result in a
proper solution when the standard ‘2 or ML fitting results in a Heywood case. This is
illustrated with the notorious five socio-economic variables of Harman (1976, Table 2.1).
In such situations the ‘1 FA may be preferred. The reader is warned that the ‘1
reformulation of the FA is not an escape from the Heywood case problems. It is simply
another goodness-of-fit measure which may be more appropriate for certain data. The
reader should keep in mind that the ‘1 FA involves heavier computation than the
standard ‘2 or ML ones because the (smoothed) ‘1 goodness-of-fit measure is more
sophisticated. One may also expect the ‘1 FA to be more vulnerable to local minima.
Fitting the factor analysis model in ‘1 norm 31

Such behaviour was demonstrated in the presented numerical examples: the ‘1 FA


algorithm converges to a three local minima only for the one-factor solutions.
In our opinion the lower triangular reparameterization of the FA model is preferable
to the EVD one because it leads to solutions having a ‘simple structure’-like pattern.
Another advantage is that the corresponding projected gradient algorithm is faster as it is
evolving on a linear subspace compared to the EVD one, which evolves on a curvilinear
set (manifold).
It is demonstrated that the projected gradient approach is a rather general strategy
for attacking such types of matching problem and can be used with other smooth robust
functions as well, such as the Huber M-function and the Hampel-type function
(Rousseeuw & Leroy, 1987) or other smooth approximation of the ‘1 norm (Osborne,
Presnell, & Turlach, 2000).
This approach involves first-derivative information and leads to globally convergent
(matrix) procedures for simultaneous estimation of the FA parameters. The
computational procedures are implemented in MATLAB and based on the MATLAB
ODE suite of numerical integrators for solving IVPs for ODEs.

Acknowledgements
The author thanks the editor and two anonymous referees for their understanding and
suggestions. They have helped to improve some of the results which are presented here.

References
Anderson, T. W. (1984). Introduction to multivariate statistical analysis (2nd ed.). New York:
Wiley.
Bartholomew, D. J., & Knott, M. (1999). Latent variable models and factor analysis (2nd ed.).
London: Arnold.
Chu, M. T., & Trendafilov, N. T. (2001). The orthogonally constrained regression revisited. Journal
of Computational and Graphical Statistics, 10, 746 – 771.
Harman, H. H. (1976). Modern factor analysis (3rd ed.). Chicago: University of Chicago press.
Helmke, U., & Moore, J. B. (1994). Optimization and dynamical systems. London: Springer-
Verlag.
Jöreskog, K. G. (1977). Factor analysis by least-squares and maximum likelihood methods. In
K. Enslein, A. Ralston & H. S. Wilf (Eds), Mathematical methods for digital computers, Vol. 3
(pp. 125 – 153). New York: Wiley.
Magnus, J. R., & Neudecker, H. (1988). Matrix differential calculus with applications in statistics
and econometrics. New York: Wiley.
Osborne, M. R., Presnell, B., & Turlach, B. A. (2000). On the LASSO and its dual. Journal of
Computational and Graphical Statistics, 9, 319 –337.
Pison, G., Rousseeuw, P. J., Filzmoser, P., & Croux, C. (2003). Robust factor analysis. Journal of
Multivariate Analysis, 84, 145 –172.
Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance
determinant estimator. Technometrics, 41, 212 – 223.
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York: Wiley.
SAS Institute (1990). SAS/STAT user’s guide (4th ed.), Version 6, Vol. 1. Cary, NC: SAS.
Shampine, L. F., & Reichelt, M. W. (1997). The MATLAB ODE suite. SIAM Journal on Scientific
Computing, 18, 1 – 22.
Trendafilov, N. (2003). Dynamical system approach to factor analysis parameter estimation.
British Journal of Mathematical and Statistical Psychology, 56, 27– 46.

Received 19 April 2002; revised version received 15 March 2004

You might also like