Professional Documents
Culture Documents
Simultaneous Factor Analysis in Several Populations
Simultaneous Factor Analysis in Several Populations
4
DECEMSER, 1971
K. G. J6RESKOG
EDUCATIONAL TESTING SERVICE~
P. A General Model
with 8(]~) = 0 and 8(zo) = 0 and Ao a factor pattern of order Ps X k~. The
usual factor analytic assumptions then imply that
(2) zs = As¢~A' + ¢~,
where ¢~ is the variance-covariance matrix of Is and ~k~is the diagonal variance-
covariance matrix of z~.
In addition to assuming that a factor analytic model holds in each popu-
lation the model may specify that certain parameters in As , ~s , ~b~ , g =
1, 2, --- , m have assigned values and that some set of unknown elements in
As, ~s and ¢~ are the same for all g. The most common situation is when the
same battery has been administered to each group and when the whole factor
pattern A, is assumed to be invariant over groups. This case will be considered
separately in section 3.
(6) d = ~ ½p~(pg + 1) -- t
Then it follows from the corresponding results for a single population (see e.g.,
Lawley & t~axwell, [1963], Chapter 6 or Jhreskog, [1969]) that for
g---- 1,2, -.- ,m,
We shall also need expressions for 8(02F/00~ 00~), where O, and 0~ are
a n y two parameters. I f 0~ is a n element of As, ¢~ or ~ and 0~ is a n element
of Ah, ~h or ~h, g ~ h, 8(O~F/O0~00~) is zero. Otherwise, if b o t h 0~ a n d 0~ are
elements of As , 4o or ~b~ , the required second-order derivatives can be
expressed in terms of the elements of
(9a) = Z-~A
(9b)
(gd)
(9e) 7 = ~A 'Z-~A~ = ~ ,
as
(IO0
Here we have omitted the subscript g for simplicity of notation.
The function F is regarded as a function of the elements of As, ~ , ~b~,
g = 1, 2, - - - , m, and is to be minimized with respect to these taking into
account t h a t some elements m a y be fixed and some m a y be constrained to be
equal to others. Such a minimization problem m a y be solved as follows.
Let 0g be a vector of all the elements in As , ~g and ~bg arranged in a
prescribed order. Since ~g is symmetric, only the elements in the lower half and
the diagonal are counted. T h e n 0g is of order re = pgk~ + ½k,(k~ + 1) + p~.
L e t 6' = (6' , 6' , . . . , 6"). T h e n 8 consists of all the elements of all the
p a r a m e t e r matrices and is of order r = rl + r2 + • • • + r ~ . The function F
m a y now be regarded as a function ](O) of 01,62, • • • , 0~, which is continuous
a n d has continuous derivatives aF/O0~ and 02F/00~ 00i of first and second
order, except where a n y ~ is singular. The totality of these derivatives is
represented by a gradient vector 0F/06 and a symmetric second order deriva-
tive matrix c'}2F/O006'.
The vector OF/O0 of order r is formed b y arranging the elements of the
derivative matrices (80)-(8c) in the same order as the elements of As , ~g
and ~bo , g = 1, 2, . . . , m, in 6. An as approximation to the r )~ r matrix
414 PSYCttOMETRIKA
0 0 ~(o~F/oo~,oo')
where 8(O~F/OOg00'~)is a symmetric matrix of order rg X r~ formed b y com-
puting (10a)-(10f) and arranging these so t h a t the order of rows and columns
corresponds to the order of the parameters in 8 , .
Now let some r -- ~ of the 0's be fixed and denote the remaining O's b y
7rl , 7r2 , --- , ~r,, s < r. T h e function F is now regarded as a function G(Tr)
of ~rl , ~r~, --- , 7r.. Derivatives OG/&rand 8(02G/&r 0~") are obtained from
OF/O0and 8(02F/00 00') b y omitting rows and columns corresponding to the
fixed 8's. Among ~rl, ~rz, --" , ~r, let there be some t distinct and independent
parameters denoted ~,, K2, --" , Ks, t _< 8, so t h a t each 7r~ is equal to one and
only one Ki b u t possibly several r ' s equal the same K. L e t K = (k~) be a
matrix of order s X t with elements k~,. = 1 if r~ = K; and k,~; = 0 otherwise.
The function G (or F) is now a function H(K) of the independent arguments
K1,K2, " " ,K, a n d w e h a v e
(12) OH~OK= K'(OG/&r)
(13) 8(02H/OKOK') = K'8(O2G/&r&r')g.
Thus, the first-order and expected second-order derivatives of H are simple
sums of the corresponding derivatives of G.
For the minimization of H(K) we use a modification of the method of
Fletcher and Powell [1963] for which a computer program has been written by
Gruvaeus and JOreskog [1970]. This method makes use of a symmetric matrix
E of order t X t, which is evaluated in each iteration. Initially E is any
positive definite matrix approximating the inverse of 02H/OKOK'.In subse-
quent iterations E is improved, using information built up about the function
so t h a t ultimately E converges to an approximation of the inverse of
02H/OK 0~' a t the minimum. If t is large, the number of iterations m a y be
excessive b u t can be considerably decreased b y the provision of a good starting
point for K and a good initial estimate of E.
I n principle, a good initial estimate of E m a y be obtained b y computing
02H/OK OK' at the starting point and then inverting this matrix. However,
in our problem, the second-order derivatives are rather complicated and
time-consuming to compute. Instead, we therefore use estimates of the
second-order derivatives provided b y the information matrix
In addition to being more easily evaluated, this matrix also yields other
valuable information. The inverse of g(02H/OK 0~') evaluated at the minimum
of H is an estimate of the variance covariance of the estimated parameters
gl, g2, • "" , gt • This may be used to obtain standard errors of the estimated
parameters.
The starting point g may be chosen arbitrarily but the closer it is to
the final solution the fewer iterations will be required to find the solution.
The minimization method converges quadratically from an arbitrary starting
point to a local minimum of the function. If several local minima exist there
is no guarantee that the method will converge to the absolute minimum.
3. A M o d e l of Factorial I n v a r i a n c e
3.1 T h e M o d e l
Perhaps the most common application of the method just described will
be the case when the same tests have been administered in each population
a n d when it is hypothesized t h a t the factor p a t t e r n A is invariant over
populations. Although this model is a special case of the general model
described in the previous section, it deserves a separate discussion.
I n this case pl = p~ . . . . . p,~ = p and kl = ks . . . . . k~ -- k a n d
the matrices Z~ a n d ~g, g = 1, 2, • .- , m are all of the order p X p a n d the ~ ,
g = 1, 2, - - - , m are all of order k )< k. T h e c o m m o n factor p a t t e r n A is of
order p X /c. T h e regression of x~ on fo is [c.f. (1)].
(15) x~ = ~ + AL + z~
and the variance-covariance matrix ~ is
(16) Zo = A~gA' + ¢~.
I n the special case of two populations, m = 2, a stricter form of invariance
was considered b y Lawley and Maxwell [1963, C h a p t e r 8]. This requires not
only the regression matrix A in (15) to be invariant b u t also the variances
a b o u t the regression, i.e., ~b~ = 6~ • This type of restriction can easily be
incorporated using the general approach of the preceding section.
3.2 I d e n t i f i c a t i o n o] P a r a m e t e r s
Suppose t h a t the A in (16) is replaced b y A* = A T -1 a n d each ~ is
replaced b y ~ * = T ~ T ' , g -= 1, 2, . . . , m , where T is a n a r b i t r a r y non-
singular matrix of order k )< k. T h e n each ~ remains the same so t h a t the
function F in (5) is unaltered. Since the matrix T has k 2 independent elements,
this means t h a t at least k 2 independent conditions m u s t be imposed on the
p a r a m e t e r s in A, ¢ 1 , ¢ 3 , • • • , ~ to make these uniquely defined.
Within the framework of the general procedure of the previous section,
the m o s t convenient w a y of doing this is to let all the ~, be free and to fix
one nonzero element and a t least k - 1 zeros in each column of A. I n an
exploratory s t u d y one can fix exactly k - 1 zeros in almost a r b i t r a r y positions.
K , G . JORESKOG 417
For example one m a y choose zero loadings where one thinks there should be
"small" loadings in the factor pattern. The resulting solution may be rotated
further, if desired, to facilitate better interpretation. In a confirmatory study,
on the other hand, the positions of the fixed zeros, which often exceed k - 1
in each column, are given a priori b y an hypothesis and the resulting solution
cannot be rotated without destroying the fixed zeros.
(23) S* = DSoD.
the previous section, the factor loadings are of the same order of magnitude
as usual when correlation matrices are analyzed and when factors are stand-
ardized to unit variances. This makes it easier to choose start values for
the minimization (see section 3.5) and interpret the results.
It should be pointed out that it is not permissible to standardize the
variables in each group and to analyze the correlation matrices instead of
the variance-covariance matrices. This violates the likelihood function (4)
which is based on the distribution of the observed variaf~ces and covariances.
Invariance of factor patterns is expected to hold only when the standardiza-
tion of both tests and factors are relaxed.
(24) H~ : ~1 = ~ . . . . . ~,~ •
This hypothesis HA~t can be tested directly on the basis of the pooled S
in (21). T h e test of HA~t against H z uses a x 2 with
d~.~ = ½p(p + 1) - pk + q - ½k(k + 1) - - p
degrees of freedom.
K. G. JORESKOG 421
TABLE la-b
Intercorrelation Matrices (a)
Visual Perception -- .32 .48 .28 .26 .40 .42 .12 .23
Cubes .24 -- .33 .01 .01 .26 .32 .05 --.04
Paper F o r m Board .23 .22 -- .06 .01 .10 .22 .03 .01
General Informa-
tion .32 •05 .23 -- .75 .60 .15 --0.8 --0.5
Sentence Com-
pletion .35 .23 .18 .68 -- .63 .07 .06 .10
Word Classification .36 .10 .11 .59 .66 -- .36 .19 .24
Figure Recognition .22 .01 - 0 . 7 .09 .11 .12 -- .29 .19
Object-Number -- .02 - - 0 . 1 --.13 .05 .08 .03 .19 -- .38
Number-Figure .09 - - . 1 4 -.06 .16 .02 .12 .15 .29 --
Visual Perception -- .34 .41 .38 .40 .42 .35 .16 .35
Cubes .32 -- .21 .32 .16 .13 .27 .01 .27
Paper F o r m Board .34 .18 .31 .24 .35 .30 .09 .09
General Informa-
tion .31 .24 .31 -- .69 .55 .17 .31 .34
Sentence Com-
pletion .22 .16 .29 .62 -- .65 .20 .30 .27
Word Classification .27 .20 .32 .57 .61 -- .31 .34 .27
Figure Recognition .48 .31 .32 .18 .20 .29 -- .31 .38
Object-Number .20 .01 .15 .06 .19 .15 .36 -- .38
Number-Figure .42 .28 .40 .11 .07 .18 .35 .44 --
Unscaled ScMed
2 3 4 1 2 3 4
"1 0
X x x
X x X
0 1 0
(31) A,, = X X x
X X x
0 0 I
X x xl
X x xJ
Here the zeros and ones stand for fixed values and x's for parameters to be
estimated. Tests 1, 4 and 7 have been chosen to be pure in their respective
factors. The test of HA. gives X~ = 90.57 with 102 degrees of freedom, which
is not significant. Thus, we cannot reject the hypothesis that there is an
invariant factor pattern with three factors. To strengthen the model we
424 PSYCHOlYIETRIKA
TABLE 2
Summary of Analyses
now make use of our knowledge about the tests and hypothesize t h a t the
invariant factor p a t t e r n has a specific f o r m namely
-1 00-
x 0 0
X 0 0
0 1 0
(32) A= 0 x 0 ,
0 x 0
0 0 1
0 0 x
0 0 x
i.e., we assume t h a t A has a nonoverlapping group structure, where the first
three tests are loaded on the first factor only, the next three tests on the
second factor only and the last three tests on the third factor only. As before,
we put no constraints on the ~'s and ~k's. A test of this hypothesis gives
X2 = 131.24 with 114 degrees of freedom. This has a probability level of
a b o u t 0.13. Thus we cannot reject 'the hypothesis t h a t the invariant factor
p a t t e r n is of the specified form. An examination of the ~b's, in relation to
their standard errors in the solution under HA , revealed t h a t m a n y of these
were not sufficiently different to be considered different. This suggests t h a t
one should also examine the hypothesis HA t , t h a t a stricter form of invariance
holds, namely where also the ~b's are the same for all populations. A test
of HA~ gives X2 = 172.14 with 141 degrees of freedom. This is just significant
a t the 5 % level. The maximum likelihood solution under HAt is shown in
Table 3. Finally, to complete the sequence of hypotheses we consider the
~. G. JSRESXOO 425
TABLE 3
Maximum Likelihood Solution under HA
(Asterisks Denote Parameter Values Specified by Hyptohesis)
S V 1~
S V M S V M
^ srl.o2 o] A sros9 ]
~l = V10-53 0.91 (I,2 = V10.62 0.93
ME1.03 0.36 1.3 MLO.59 0.50 0.58
S V M S V M
sr0.72 1.06 oJ st1 38 ]
= v/0.52 ~4 = V|0.42 1.12
MLo.os 0.20 0.9 M[0.71 0.27 1.25
426 PSYCHOMETRIKA
REFERENCES
Box, G. E. P. A generM distribution theory for a class of likelihood criteria. Bi~ne~ika,
1949, 35, 317-346.
Fletcher, R. & Powell, M. J. D. A rapidly convergent descent method for minimization.
The Computer Journal, 1963, 6, 163-168.
Gruvaeus, G. T. & JSreskog, K. G. A computer program for minimizing a function of
several variables. Research Bulletin 70-14. Princeton, N. J.: Educational Testing
Service, 1970.
Holzinger, K. J. & Swineford, F. A study in factor analysis: The stability of a bi-facter
solution. Supplementary Educational Monograph No. 48. Chicago: University of
Chicago, 1939.
JSreskog, K. G. Some contributions to maximum likelihood factor analysis. Psychometrika,
1967, 32, 443-482.
JSreskog, K. G. A general approach to confirmatory maximum likelihood factor analysis.
Psychometrika, 1969, 34, 183-220.
Lawley, D. N. A note on Karl Pearson's selection formulae. Proceedings of the Royal Society
of Edinburgh, Section A, 1943-44, 52, 28-30.
Lawley, D. N. & Maxwell, A. E. Factor analysis as a statistical method. London: Butter-
worths, 1963.
McGaw, B. & JSreskog, K. G. Factorial invariance of ability measures in groups differing
in intelligence and socioeconomic status. British Journal of Mathematical and Statistical
Psychology, 1971, 24, in press.
Meredith, W. Rotation to achieve factorial invariance. Psyctwmetrika, 1964, 19, 187-206. (a)
]~Ieredlth, W. Notes on factorial invariance. Psychometrika, 1964, 2% 177-185. (b)
van Thillo, M. & JSreskog, K. G. A generaI computer program for simultaneous factor
analysis in several populations. Research Bulletin 70-62. Princeton, N. J.: Educational
Testing Service, 1970.