Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Journal of Family Psychology Copyright 2005 by the American Psychological Association

2005, Vol. 19, No. 1, 121–132 0893-3200/05/$12.00 DOI: 10.1037/0893-3200.19.1.121

Cluster Analysis in Family Psychology Research


David B. Henry, Patrick H. Tolan, and Deborah Gorman-Smith
University of Illinois at Chicago

This article discusses the use of cluster analysis in family psychology research. It provides an
overview of potential clustering methods, the steps involved in cluster analysis, hierarchical
and nonhierarchical clustering methods, and validation and interpretation of cluster solutions.
The article also reviews 5 uses of clustering in family psychology research: (a) deriving
family types, (b) studying families over time, (c) as an interface between qualitative and
quantitative methods, (d) as an alternative to multivariate interactions in linear models, and
(e) as a data reduction technique for small samples. The article concludes with some cautions
for using clustering in family psychology research.

Keywords: cluster analysis, research methodology, qualitative data, multivariate analysis

With the application of systems theory in family psychol- primary methods used, and review examples of clustering
ogy (Bateson, Jackson, Haley, & Weakland, 1956; Bowen, applications in family research. In addition, we discuss
1960) came the need to simultaneously consider related some cautions for using these methods with family data.
dimensions when describing clinical phenomena and fami-
lies (Haley, 1971). However, the statistical methods pre- Overview of Clustering Methods
ferred by family researchers are more appropriate for ana-
The logic of clustering differs from that of methods that
lyzing individual differences than for describing groups of
emphasize relations among variables. Fundamentally, clus-
families along multiple dimensions simultaneously (Man-
tering involves sorting cases or variables according to their
dara, 2003). Better describing meaningful subgroups of
similarity on one or more dimensions and producing groups
families and linking these subgroups to development, risk,
that maximize within-group similarity and minimize
and intervention requires a method more attuned to this
between-group similarity. Kaufman and Rousseeuw (1990,
multidimensional perspective. In this article, we discuss
p. 1) defined cluster analysis as the classification of similar
cluster analysis, an approach that can better connect family
objects into groups, where the number of groups, as well as
psychology research with its theoretical and clinical
their forms, may be unknown. Critics point to the unknown
interests.
aspects of clustering and its relative lack of statistical so-
We use the term cluster analysis to refer to a general
phistication as warranting caution in its use (Morgan & Ray,
approach composed of several multivariate methods appro-
1995).
priate for research within family psychology (Mandara,
Numerous methods exist for detecting clusters in multi-
2003). Referred to as person-oriented methods by some
variate data (Arabie & Hubert, 1992). The strategies differ
(Bergman, 1996; Bergman & Magnusson, 1997; Cairns,
in the ways they define groups and in the ways they identify
Bergman, & Kagan, 1998) in contrast to variable-oriented
groupings in the data. The measures taken to prepare data
methods, cluster analysis identifies and describes groups of
for clustering, the statistical methods employed, and the
individual cases defined by similarities along multiple di-
validation and interpretation of cluster solutions affect the
mensions of interest. These groupings can form the basis for
outcome. Each step in clustering, from choosing variables to
understanding normal development, risk, or other outcomes.
interpreting solutions, requires an awareness of the limita-
Clustering methods are used infrequently in psychologi-
tions of clustering methods.
cal research on families (Bray, Maxwell, & Cole, 1995;
Mandara, 2003). To encourage more frequent consideration
Preparing to Apply Cluster Methods
of clustering methods by family researchers, we suggest
important steps in applying cluster analysis, describe the Theoretically driven measures. Clustering algorithms
can always find some clusters, even in data that are ran-
domly distributed. Therefore, selecting variables consistent
with relevant theoretical perspectives and expectations is
David B. Henry, Patrick H. Tolan, and Deborah Gorman-Smith,
Institute for Juvenile Research, Department of Psychiatry, Univer-
critical. However, there is little benefit from redundant
sity of Illinois at Chicago. measures. Therefore, when multiple measures of the same
Correspondence concerning this article should be addressed to construct are available, it is preferable, prior to cluster
David B. Henry, University of Illinois at Chicago, Department of analysis, to combine different sources of information into
Psychiatry, 840 South Wood Street, Chicago, IL 60612. E-mail: higher order measures based on confirmatory factor analysis
dhenry@uic.edu or other methods (Gorman-Smith, Tolan, & Henry, 2000).

121
122 HENRY, TOLAN, AND GORMAN-SMITH

It may also be that collapsing information provided by


multiple family members is inconsistent with theory or the
observed association between sources of data. Variation
among sources of information may be expected in theory or
may be observed in low correlations between different
sources. Differences among sources of information may
provide important insights into the characteristics of fami-
lies and the nature of family clusters. Therefore, multiple
family members’ measures of the same constructs should be
combined only if they are both theoretically consistent and
statistically related. For example, Belsky and Hsieh (1998),
in a cluster analysis of patterns of change in marriage, found
that separate cluster solutions for husbands and wives
aligned more consistently with theory than a single solution
derived by clustering averaged husbands’ and wives’
scores.
Equating scaling of measures. Another issue of impor- Figure 1. Scatterplot matrix of family relationships and parent-
tance in data preparation is equating the variables prior to ing practices scales from Wave 3 of the Chicago Youth Develop-
entry into a cluster analysis. Debate abounds in the literature ment Study. W3COHES ⫽ family cohesion; W3FAMBEL ⫽
family beliefs; W3STRUCT ⫽ family structure.
on standardizing variables prior to clustering (Aldenderfer
& Blashfield, 1984; Seber, 1984, pp. 353–355) given that
unequal scaling of variables to be analyzed creates unequal
variable weights. Variables with greater ranges or larger ters. Such knowledge can help in deciding which clustering
variances may have greater influence in the calculations methods to use.
used to determine clusters. On the other hand, standardizing
variables so that they have equal variances may result in Clustering Methods
solutions that do not reflect the natural clustering of the data
(Milligan, 1996). Despite the debate surrounding how best The three methods of clustering are hierarchical, nonhi-
to standardize variables, there is some agreement that di- erarchical, and overlapping (Seber, 1984, pp. 349 –350). Of
viding each variable by its range provides an acceptable these methods, only hierarchical and nonhierarchical are
method of equating variables. Dividing variables by their widely used in family psychology research. Hierarchical
ranges places all variables on the same scale while leaving and nonhierarchical clustering are distinguished by the tech-
differences in variances intact. This approach is supported nique used to derive clusters from the data. However, the
by Milligan and Cooper (1988), who found that dividing by distinction has theoretical implications as well. Nonhierar-
the range was superior to five other methods of chical clusters are mutually exclusive categories and, there-
standardization. fore, do not represent any theoretically nested structure.
Visually examining the data. After selecting theoreti- Hierarchical methods can be used to create mutually exclu-
cally meaningful measures and equating the scaling of mea- sive groups, but nonhierarchical methods cannot represent
sures, it is important to visually examine the distributions. A nested structures. Consider, for example, Baumrind’s
graph or chart can reveal important information for select- (1991) distinction between authoritative and authoritarian
ing clustering methods and interpreting the solutions. Ex- families. Any single family would be categorized as either
amining the distributions can, particularly with fewer vari- authoritative or authoritarian. The theoretical clustering
ables, suggest the extent to which the data are clustered and would be nonhierarchical, and either hierarchical or nonhi-
the shapes of the clusters. Two- and three-dimensional erarchical methods would be appropriate. If, however, one
scatterplots are available in most statistical packages. Al- conceptualized authoritarian and authoritative families as
though it becomes difficult to visualize joint distributions subtypes of a type of family that exercised authority over
when more than three variables are considered, a scatterplot their children, hierarchically arranged clusters would be
matrix, or SPLOM, can reveal the univariate and bivariate more consistent with theory.
marginal distributions. For example, Figure 1 is a SPLOM Hierarchical clustering. Hierarchical methods are in-
of three family relationship variables (cohesion, beliefs, and tended to reveal the nested structure of clusters within
structure) from Wave 3 of the Chicago Youth Development multivariate data. Most statistical software uses joining for
Study (Gorman-Smith, Tolan, & Henry, 2000). Areas of hierarchical clustering. Joining begins by linking the indi-
lower or higher than expected density, such as those circled vidual observations closest to one another in a space defined
in Figure 1, may indicate clustering beyond what would be by the dimensions used in the analysis. Once these clusters
expected in random data. This can aid in deciding whether are formed, they are joined with other clusters or individual
to apply cluster analysis to the data. In addition, visual observations to create larger clusters. This process continues
examination can suggest whether the variables to be clus- until all observations are joined together into a single
tered have multivariate normal distributions within the clus- cluster.
SPECIAL ISSUE: CLUSTER ANALYSIS 123

Hierarchical clustering creates a representation of the which is calculated by summing the squared differences
structure of the data from which the clusters to retain can be between cases on each variable and using the square root of
chosen. This is illustrated in Figure 2. The left panel of the sum. Euclidian distances are used with numeric data but
Figure 2 is a scatterplot of two family variables, cohesion may also be used with binary data. Distances used exclu-
and supervision of children, with the cases labeled to show sively in clustering categorical variables are based on the
their corresponding positions in the tree diagram, or den- logic of the 2 ⫻ 2 table. Common distances for categorical
drogram, in the right panel. The data set contained measure- data are Jaccard’s (1908) measure and distances based on
ments of these two variables for 15 families. Presented with the chi-square statistic and the square of the phi coefficient.
a hierarchical cluster solution, such as that in Figure 2, a Many other distance metrics are available in hierarchical
researcher would most likely decide to retain three clusters clustering programs. Each distance metric has strengths and
defined by (a) low cohesion and low supervision, (b) low weaknesses. The interested reader is referred to Seber
cohesion and high supervision, and (c) high cohesion and (1984, pp. 351–359), Aldenderfer and Blashfield (1984),
high supervision. Alternatively, a researcher might combine and Kaufman and Rousseeuw (1990, chap. 1) for more
Clusters a and b into Cluster c, regarding the former as detailed discussions and definitions of distance metrics.
subtypes of the latter. Such a decision would result in We recommend Euclidian distances in cluster analysis in
clusters defined by low and high cohesion, possibly with family psychology. Rescaling variables by their ranges prior
subtypes defined by levels of child supervision. to calculating distances compensates for the most serious
Hierarchical clustering requires the researcher to select a problem with Euclidian distances—namely, their sensitivity
distance metric, which is a unit of measurement for express- to variations in the ranges of variables—yet leaves sufficient
ing the distances between cases in multivariate space. Hi- information about variances to minimize the likelihood of
erarchical clustering also requires the researcher to select a cluster solutions that inaccurately represent the natural clus-
way of defining the link between clusters. The choice of ters in the data.
distance metrics combined with the scaling of variables can In addition to the measure of distances between cases, it
greatly affect the relative distances between cases, which in is necessary to define how clusters will be linked together to
turn affect the cluster solution obtained. The most common form other clusters, or linkage. Whereas distance refers to
distance metric in family research is the Euclidian distance, the unit of measurement, linkage refers to the point in a

Figure 2. A: Bivariate scatterplot of supervision and rules on family cohesion. Cases are labeled
to allow comparison with the dendrogram in Panel B. B: SPSS dendrogram (tree diagram) from a
hierarchical cluster analysis of 15 families on measures of supervision and rules for children and
family cohesion. Single linkage (nearest neighbor) and Euclidian distances were used in the cluster
analysis.
124 HENRY, TOLAN, AND GORMAN-SMITH

cluster from which distance measures will be calculated. and SYSTAT, nonhierarchical clustering is most often im-
Because clusters contain multiple cases, distances can be plemented with the K-means algorithm (MacQueen, 1967).
measured between the closest observations, the most distant With this method, initial cluster centers (values representing
observations, the centroids (the points of averages of the the average of each cluster on each variable) are either
clusters), or in other ways. Commonly used linkages in entered by the researcher or chosen at random by the pro-
family studies are single linkage, complete linkage, centroid gram. Once initial centers are chosen, the program assigns
linkage, and Ward’s (1963) linkage. cases to the cluster whose center is nearest. Assigning the
In single linkage, distance is measured between the ob- cases in this manner usually changes the cluster centers, and
servations in two clusters with the smallest distance be- thus objects are reassigned to clusters and the centers are
tween them. For this reason, single linkage is sometimes updated again. This process continues until no objects
called nearest neighbor linkage. Complete linkage is the change their cluster memberships.
opposite of single linkage and, as a result, is sometimes There is no consensus on methods for determining
called farthest neighbor linkage. It involves measuring the whether a nonhierarchical cluster solution fits the data.
distance between clusters by using the observations with the Methods for comparing nonhierarchical cluster solutions are
greatest distance between them. Ward’s (1963; Wishart, not currently implemented in statistics programs. Recently,
1969) method of linkage links clusters together on the basis however, some helpful procedures have been developed,
of the degree of similarity between observations in the same and they are being increasingly used in family psychology
cluster. Ward’s linkage minimizes the within-cluster sum of research. Most procedures involve comparing the mean
squares of each cluster when clusters are joined together. distances between individual observations and their cluster
As is the case with distance metrics, the choice of linkage centers for different numbers of clusters. These distances
method affects the results. In particular, it may affect may be compared through a scree plot (Breckenridge, 2000;
whether the clusters obtained are unique and the degree of Mandara, 2003) or a t test or F test (Beale, 1969; Gorman-
sensitivity to areas of greater density in the multivariate Smith, Tolan, & Henry, 2000).
distribution. Morgan and Ray (1995) noted that in a Monte In family psychology research, nonhierarchical clustering
Carlo study of linkage methods, only single linkage pro- has often been used as a second step in a cluster analysis.
duced unique cluster solutions. Other methods could return First, hierarchical methods in combination with theoretical
different solutions from the same data set. However, if the discussion are used to determine how many clusters to
data are not tightly clustered, as is often the case in family expect and where to place the initial cluster centers. Next,
research, single linkage may not clearly differentiate among nonhierarchical clustering, using the predetermined number
clusters that exist in the data. of clusters and initial centers, is used to assign observations
We recommend single linkage as an initial step in family to clusters. This combination of methods capitalizes on the
psychology studies that use hierarchical clustering. If single strengths of both methods and compensates for their
linkage does not return identifiable clusters, the clusters in weaknesses.
the data may be variations in density rather than groups
separated by empty space. In that case, further analysis with
other linkage methods can be undertaken. If it is believed Validating Cluster Solutions
that the clusters in the data are roughly multivariate normal
in shape, clustering with Ward’s (1963) linkage would be a Ensuring validity requires attention from the beginning of
reasonable second step. If visual examination suggests that a cluster approach. Researchers should attend to the content
clusters are not multivariate normal in shape, complete validity of the solution at each step in the analysis because
linkage or centroid linkage might be a better choice for multiple cluster solutions and potentially acceptable num-
further analysis. Interested readers may consult Aldenderfer bers of clusters are usually possible from a single data set.
and Blashfield (1984), Seber (1984), or other texts on clus- Selecting an appropriate solution requires an interplay be-
tering for more specific guidance. tween theory and analysis in the attempt to maximize the
Nonhierarchical clustering. Nonhierarchical clustering content validity of the solution.
differs from hierarchical clustering in that it does not pro- One approach to validating a cluster solution is to rean-
duce a nested structure of the data. Rather, nonhierarchical alyze the data with a different clustering method. This
clustering results in discrete clusters. Unlike hierarchical approach is sometimes referred to as confirmatory cluster
methods, nonhierarchical clustering requires that the re- analysis (Fisher & Ransom, 1995). Reanalyzing the data
searcher specify a number of clusters to extract prior to using a different method can determine the extent to which
conducting the analysis. Nonhierarchical clustering has both solutions produced by the two methods converge. For ex-
advantages and disadvantages. Nonhierarchical clustering ample, Fisher and Ransom (1995) used hierarchical cluster-
routines do not allow the researcher as much latitude in ing with Ward’s (1963) linkage to derive four clusters of
choosing distance metrics and ways of differentiating families on 11 scales. To confirm the solution, they con-
among clusters as do hierarchical routines. However, some ducted a K-means analysis, specifying four clusters, and
nonhierarchical clustering programs allow researchers to found substantial agreement between the solutions in as-
specify initial cluster centers and to choose distance metrics. signing cases to clusters. A second approach to validation, if
In widely used statistical software such as SAS, SPSS, the sample is large enough, is to randomly split the sample,
SPECIAL ISSUE: CLUSTER ANALYSIS 125

analyze each half separately, and compare the solutions as resilient by a cluster analysis had higher Stanford-Binet
(Mandara, 2003). IQ scores and lower problem behavior scores on the Child
A third approach is to confirm the solution on a second Behavior Checklist than did children in families classified
sample. Gorman-Smith, Tolan, and Henry (1999a) took this as “getting by” and “at risk.” They suggested that these
approach in a cluster analysis of relationship patterns among differences reflected the likely effects of family resilience
young urban couples. They conducted an initial cluster and, therefore, supported the validity of the cluster solution.
analysis using a sample of reports from 203 individuals on
the characteristics of their romantic relationships. They then Interpreting Cluster Solutions
pooled partners’ measures in a sample of 46 couples and
cluster analyzed that data set, examining fit measures to One of the most difficult aspects of using cluster analysis
determine whether the same number of clusters fit the data. is interpreting and comparing different cluster solutions.
Next, they assigned partners in the couples’ data set to the Although preparing and conducting the analysis are critical
clusters with centers chosen by the individual solution. steps, naming clusters and interpreting their differences is
They then conducted a multivariate analysis of variance of equally critical yet highly subjective, which adds to the
the variables used in clustering with terms for the sample difficulty. One reason for this difficulty is that cluster anal-
(individual or couple), cluster, and the Sample ⫻ Cluster ysis can contain more dimensions than can be easily inter-
interaction as predictors. They found a significant cluster preted. In addition, multiple cluster solutions are often
effect and nonsignificant effects for sample and the interac- available, and, as noted, little statistical guidance exists for
tion between sample and cluster. They concluded from this choosing one solution over another on the basis of their
that the individual solution also fit the pooled couples’ data relative fit to the data.
and that a significant Sample ⫻ Cluster interaction would Grouped bar charts are useful visual aids when only a few
have indicated that the solution did not fit. variables are clustered, but they become cumbersome with
A fourth approach to validation is to test the criterion- increasing numbers of variables and clusters. When one is
related validity of the solution, that is, to use the solution to faced with numerous variables and clusters, a helpful plot is
predict variables of interest in ways consistent with theory variously called a radar plot (SAS and MS Excel), star plot
(Aldenderfer & Blashfield, 1984, pp. 62–74; Mandara, (SYSTAT), or sun ray plot (STATSOFT). Figure 3 is a
2003). Several family studies have used such an approach, radar plot of two clusters representing patterns of family
in which evidence for the validity of a cluster solution problems drawn from a cluster analysis of four family
comes from the ability of the clusters to predict outcomes problem measures gathered annually for 4 years. The data
that are unrelated to the variables used in the clustering but used to construct these plots were reported by Gorman-
that are theoretically related to the clusters. For example, Smith, Tolan, Loeber, and Henry (1998). Each plot repre-
Danseco and Holden (1998), in a study of types of homeless sents a single cluster, and each ray of the plot represents a
families, found that children whose families were classified single variable. In Figure 3, variables are identified with

Figure 3. Radar plots of two family risk clusters, plotted in SAS PROC GRADAR from cluster
means published in Gorman-Smith, Tolan, Loeber, and Henry (1998). This type of plot allows
simultaneous presentation of cluster means on all variables used in the analysis. Variables in this
plot are identified by their clock positions. The “deviant” family cluster is identified by high means
on deviance at all waves, and the “disruption and conflict” cluster is identified by high means on
disruption and moderate means on conflict at all waves. W ⫽ wave.
126 HENRY, TOLAN, AND GORMAN-SMITH

numbers in the legend and may be found on the plot. creates a basis for dialogue between theory and the solutions
Variable 1 (Wave 1 conflict) is displayed on the plot in the produced by the analysis. This dialogue is the final stage in
12:00 position, whereas Variable 5 (Wave 2 conflict) is interpreting cluster solutions. In this stage, the theoretical
displayed in the 3:00 position. The axis for each variable considerations raised in the early stages of the analysis may
radiates out from the center of the plot, and the mean of the be raised again in light of the results. Were all of the
cluster is the intersection of the “star” with the axis. The variables chosen for the analysis optimal, or should certain
axes can be calibrated if desired. This plot clearly shows the variables be added or deleted? Which of several possible
differences between the clusters on each of the 16 variables solutions is most interpretable in light of the existing theo-
used. These types of plots can be used to display up to 200 ries (Lee, Rice, & Gillespie, 1997)? Such dialogue often
variables (in SAS) simultaneously, which is more than leads to further cluster analyses as the correspondence be-
sufficient for most cluster analysis applications in family tween analysis and interpretation is refined.
psychology.
Interpretation begins by seeking the distinctive features Using Cluster Analysis in Family Psychology
of each cluster. For example, the first cluster in Figure 3 has Research
elevated levels of deviance at all waves, most conspicuously
at Wave 2 (Variable 6 in the left-hand plot). The second The methods of hierarchical and nonhierarchical cluster-
cluster, in contrast, has elevated levels of disruption at all ing described here have been applied to various problems in
waves. These features led Gorman-Smith et al. (1998) to family psychology, including multivariate characterization
interpret the first cluster as a group of families characterized of families, studying families over time, interpreting com-
by deviance and the second as families characterized by plex qualitative data, data reduction, and as an alternative to
chronic disruption. multivariate interactions in statistical models. In this sec-
Cluster analyses often return clusters that show only tion, we briefly describe studies illustrating each of these
slight differences in levels of multiple variables rather than applications.
distinct differences, such as the elevated levels of deviance
and disruption seen in Figure 3. In such cases, interpretation Multivariate Characterization of Families
may be eased by conducting post hoc comparisons using t
tests of all combinations of cluster means on each variable. The most frequent use of clustering methods in family
Such a group of tests should be protected against Type I psychology over the past 5 years has been to characterize
error inflation by reducing the alpha level for interpretation parents and families along multiple dimensions of theoret-
in accordance with the number of tests conducted. Post hoc ical interest, producing types of families. These types are
tests such as these are not implemented in clustering pro- then used to predict risk or other outcomes. Table 1 sum-
grams. They may be conducted by saving the cluster iden- marizes the cluster solutions obtained in these studies. To
tifier (a variable that indicates the cluster to which each case organize the clusters, we order them in Table 1 by their
has been assigned) and testing all differences between clus- levels of control and closeness (e.g., high levels of control
ters on all variables used in the cluster analysis. and low closeness or low control and high closeness; Mac-
Identifying the interpretable differences among clusters coby & Martin, 1983). Baumrind (1991) found four types of

Table 1
Cluster Solutions From Studies Deriving Family Types
Levels of control and closeness
Low control, low Low control, high High control, low High control, high
Study closeness closeness closeness closeness Other
Brenner & Fox Low discipline, Low discipline High discipline and Low discipline, high
(1999) nurturance, and and expectations, low nurture and
expectations expectations, nurture (21%) expectatio
(31%) high nurture
(27%)
Hamid et al. Unstructured, low Conflict–control (26%) Structured, cohesive, Structured, cohesive,
(2003) control (18%) expressive, and and low conflict
recreation-oriented (31%)
(26%)
Mandara & Defensive–neglectful Conflictive–authoritarian Cohesive–authoritative Not classified (6%)
Murray (21%) (34%) (38%)
(2002)
Gorman-Smith, Struggling families Task-oriented families Exceptionally Moderately
Tolan, & (20%) (25%) functioning families functioning
Henry (27%) families (27%)
(2000)
SPECIAL ISSUE: CLUSTER ANALYSIS 127

families based on measures of parenting practices that were (Gorman-Smith, Tolan, Henry, & Florsheim, 2000) later in
consistent with this conceptualization and have been influ- adolescence.
ential in interpreting cluster analyses in family research: (a) Despite the apparent convergence in this research, there is
authoritative families who had control of children but also not yet enough evidence to determine the extent to which
allowed independence, (b) authoritarian families who fo- these seemingly consistent family types span ethnicities,
cused primarily on control and evaluation of children, (c) cultures, and levels of socioeconomic status or whether
permissive families who were low on control and high on relations to child and parent functioning are consistent
support for children’s independence, and (d) neglectful fam- across demographic differences. The value of a large-scale
ilies, defined by low control and low support for children’s cluster-analytic study employing a wide array of measures
independence. of family functioning and parenting from multiple sources
Brenner and Fox (1999), citing Darling and Steinberg’s and targeting families that cross cultural, ethnic, geo-
(1993) distinction between parenting styles (systems of graphic, and economic lines could be considerable. Such a
attitudes and beliefs) and parenting practices (specific be- study, or future cluster-analytic family studies, could help
haviors), clustered three subscales of the Parenting Behav- determine the extent to which family configurations or
ior Checklist (PBC; Fox, 1994) to derive a typology of cluster types and their relations to outcomes can be gener-
parenting practices. They used a sample of 1,056 mothers of alized to different populations.
young children who completed the PBC as part of a vali-
dation study for the instrument. Using hierarchical cluster- Studying Families Over Time
ing and Ward’s (1963) linkage, they found a four-cluster
solution (see Table 1) that they interpreted in light of An infrequent, but potentially important, use of cluster
Baumrind’s (1991) typology of parenting styles. The family analysis in family studies is investigating families over
clusters were at significantly different levels in terms of time, or the trajectories of family development. Including
socioeconomic status, number of children, and child behav- multiple time points in analyses is a natural conceptual step
ior problems. beyond characterizing families at a single point in time, but
In another study drawing on Baumrind’s (1991) typol- this has rarely been done. We could locate only four cluster
ogy, Mandara and Murray (2002) analyzed numeric data studies that included development in any form. Two of these
from 10 subscales of the Family Environment Scale (Moos studies were published more than 5 years ago.
& Moos, 1986) and 8 subscales of the Black Family Process Gorman-Smith et al. (1998; 1999b) used scales that rep-
Q-Sort (Peacock, Murray, Ozer, & Stokes, 1996). Their licated Loeber and Stouthamer-Loeber’s (1986) measures of
sample consisted of 116 African American families, each four family risk factors: deviance, disruption, neglect, and
with a 15-year-old male or female adolescent. The authors conflict. With data from a sample of 288 male inner-city
used adolescent reports of family functioning in the cluster residents, they entered data from four annual waves into a
analysis. With initial cluster centers from a hierarchical K-means analysis and adapted Wishart’s (1982) stopping
solution, these investigators used K-means to extract three criterion in determining the number of clusters to retain.
clusters of families (see Table 1). They compared these They identified four clusters of family risk: families with
family clusters with Baumrind’s (1991) authoritative, au- minimal problems (48%), families characterized by disrup-
thoritarian, and neglectful parenting styles. The clusters tion (24%), families with deviant beliefs and behaviors
differentiated families in regard to demographics, parenting (16%), and families with multiple problems (13%). Nota-
practices, church attendance, youth self-esteem, ethnic iden- bly, change in scores over time was of negligible impor-
tity, and personality variables. tance in interpreting the clusters.
Hamid, Yue, and Leung (2003), in a study of adolescent Belsky and Hsieh (1998) used cluster analysis to examine
coping, compared their clusters with Baumrind’s (1991) couples’ marital functioning over the course of their chil-
parenting styles. These investigators clustered the percep- dren’s development, from 10 months through 5 years of age.
tions of 297 Chinese adolescents of their families on 10 They entered four waves of love and conflict measures into
Family Environment Scale subscales and found four clus- separate cluster analyses for reports by husbands and by
ters (see Table 1) that differed in regard to coping style. wives. Through hierarchical clustering, they found three
In a study using repeated measurements, Gorman-Smith, patterns of change, replicated in husbands’ reports of love
Tolan, and Henry (2000) considered combined parent and and wives’ reports of conflict: (a) stayed good, (b) good gets
child reports on three family relationship dimensions (co- worse, and (c) bad to worse. Two of these patterns (stays
hesion, beliefs, and structure) and two parenting practices good and bad to worse) were differentiated by husbands’
dimensions (discipline and monitoring) in deriving family and wives’ levels of agreeableness and neuroticism. Fami-
types among 280 inner-city families with male adolescents. lies with the longitudinal patterns of “stays good” and “good
Using four waves of these five measures, they determined gets worse” differed in regard to levels of unsupported
the number of clusters with hierarchical clustering and coparenting.
extracted clusters with nonhierarchical clustering. This Baxter, Braithwaite, and Nicholson (1999) investigated
analysis produced four clusters (see Table 1) that differed the developmental trajectories of blended families by ana-
significantly in regard to delinquent involvement (Henry, lyzing the content of interviews revealing major “turning
Tolan, & Gorman-Smith, 2001) and positive outcomes points” (events considered by participants to have been
128 HENRY, TOLAN, AND GORMAN-SMITH

important in becoming a family). Using hierarchical clus- Hawkins, Marshall, and Allen (1998) applied cluster anal-
tering of multiple turning points over multiple waves, they ysis to the Domestic Labor Questionnaire in an attempt to
identified five trajectories in the first 48 months of being a validate Hochschild’s (1989) qualitative findings on the
family. Families with different trajectories differed in vari- division of domestic labor between married couples.
ability or turbulence over 4 years as well as direction. For Cluster analysis has infrequently been used for quantita-
example, families with an accelerated trajectory rapidly tive analysis of qualitative data in family psychology. We
developed a sense of being a family, whereas those with a could find only a single example of the direct use of clus-
prolonged trajectory developed a sense of family much tering for analyzing qualitative data in family psychology.
more slowly. Families with a declining trajectory lost an To classify the expectations of mothers of children with
initial sense of being a family, and those with a stagnating hearing impairments, Dromi and Ingber (1999) applied
trajectory never developed one. Finally, those with a turbu- cluster analysis to interview data that had been coded.
lent trajectory vacillated between extremes of cohesion and As is evident from the study of Baxter et al. (1999), noted
alienation. earlier, interviews with or about families produce rich data
These studies illustrate the potential importance of devel- whose latent structure can be revealed with clustering meth-
opment in characterizing family types. All three studies ods. Data consisting of a mix of categorical and numeric
suggest that levels of functioning at a single point in time variables can also be cluster analyzed, producing categorical
and change in functioning are associated in a complex variables that can be used in further analyses. Qualitative
manner. In the Gorman-Smith, Tolan, and Henry (2000) interviews must be coded prior to entry into cluster analysis,
study, the highest and lowest functioning families both but such coding may be a simple indicator of the presence
showed stability over 4 years of assessment, whereas the or absence of words, phrases, or concepts. Qualitative cod-
moderately functioning families revealed change over time. ing programs such as NU-DIST (QSR International, 2004)
Baxter et al. (1999) also found an association between level have substantial coding capabilities and can handle large
and change in functioning among stepfamilies, and Belsky amounts of qualitative data. They can also produce output of
and Hsieh (1998), in a study of married couples, found such coded data that can be used by statistical programs. Some
an association as well. In each study, the groups at the qualitative analysis programs are beginning to incorporate
extremes of the dimensions had stable levels of variables clustering and concept mapping routines. As noted earlier,
over time. In the two latter studies, the pattern of change distances in qualitative data are defined by the degree of
rather than the relative levels of variables defined a family similarity or association between categorical variables.
type.
Interest in multivariate longitudinal clustering is rela-
tively recent. New methods have become available that may Data Reduction for Small Sample Sizes
assist in modeling the developmental trajectories of fami-
Smaller sample sizes may produce unreliable correlations
lies. One promising method for family research is latent
that are sensitive to outlying observations and inappropriate
transition analysis (LTA), developed by Collins et al. (Col-
for principal components analysis or factor analysis. With
lins, Lanza, Schafer, & Flaherty, 2002; Collins, Hyatt, &
smaller samples, clustering methods may provide a desir-
Graham, 2000).1 With LTA, one can describe the charac-
able alternative to factor analysis for data reduction. Hier-
teristics of clusters and the likelihood that cases will become
archical methods can be used to cluster rows or columns
members of other clusters at other points in time. A second,
(cases or variables) in the data matrix. Clustering items in a
newer clustering method for longitudinal analysis is PROC
scale returns a representative hierarchical structure of the
TRAJ (Jones, Nagin, & Roeder, 2001; Nagin, 1999). Im-
items that can be used in a manner analogous to a factor
plemented through the SAS system, PROC TRAJ clusters
analysis. Cordingley, Wearden, Appleby, and Fisher (2001)
multiple repeated measures, creating trajectory classes
used cluster analysis to test a questionnaire measuring fam-
(groups defined by their levels and growth patterns on the
ily responses to patients with chronic fatigue syndrome.
variable being clustered). Fit statistics are provided, allow-
They recruited 78 physician-diagnosed patients, who com-
ing one to determine the optimal number of clusters. PROC
pleted the questionnaire on two occasions. Believing that
TRAJ implements distributions appropriate for count data
factor analysis of the 30-item Family Response Question-
and data whose distributions depart substantially from nor-
naire would require a sample at least twice this size
mal. Approaches similar to LTA and PROC TRAJ can be
(Tabachnik & Fidell, 1989, p. 603), they instead used hier-
implemented with the M-Plus statistical program (B. O.
archical cluster analysis to determine how many subscales
Muthén & Muthén, 2000).
to retain. The analysis suggested four scales instead of the
original five. The cluster analysis showed that the four
An Interface Between Qualitative and Quantitative revised scales were nested within two higher order
Methods constructs.
Clustering approaches also have potential advantages
when the interest is in relating qualitative and quantitative 1
A version of LTA for Windows can be downloaded from the
methods. Cluster analysis has been used in family psychol- Pennsylvania State University Methodology Center Web site:
ogy to confirm typologies suggested by qualitative methods. http://methodology.psu.edu/downloads/winlta.html
SPECIAL ISSUE: CLUSTER ANALYSIS 129

An Alternative to Multivariate Interactions in Linear are not supported by extensive statistical reasoning. Be-
Models cause of this, clustering methods are often criticized for
producing results that are not readily generalized to larger
Clustering may also provide a useful alternative when populations. At the very least, unvalidated cluster solutions
researchers are exploring the interactions between multiple should be regarded as explorations and descriptions of the
predictors in a statistical model. This is particularly impor- structure of data rather than as hypothesis tests or confir-
tant in family research because, in multidimensional con- mations of theoretical structure. In particular, the signifi-
ceptions of families, the full range of interacting family cance tests provided by clustering routines in most statisti-
variables may not be measured well, particularly when cal programs (e.g., the analysis of variance table provided in
sample sizes are smaller. Linear models can also mistakenly the SPSS K-means routine) should not be considered a
identify interaction effects when the relations between vari- hypothesis test of the validity of the cluster solution. They
ables are nonlinear (McClelland & Judd, 1993) or when only test cluster separation on the variables the clusters were
interactions and covariates are included in the same analysis designed to separate maximally.
(Yzerbyt, Muller, & Judd, 2004). Clustering may be used as Second, clustering methods can produce multiple solu-
an alternative to entering multiple interactions in linear
tions from the same data. As we noted earlier, the solutions
models. Cases can be clustered through the use of multiple,
produced by hierarchical clustering are sensitive to the
relevant variables, and then dummy codes or effect codes
linkage method used (Morgan & Ray, 1995). The solutions
for cluster membership can be entered into the model,
produced by nonhierarchical clustering are sensitive to sev-
easing interpretation of differences.
eral factors. K-means clustering is sensitive to nonnormality
For example, Henry et al. (2001) used clustering in place
of interaction terms among multiple predictors. They en- because it uses a least squares criterion for calculating
tered five measures of family relationships and parenting the distances between cluster centers. In SAS PROC
practices into a linear model predicting later property and FASTCLUS, it is possible to specify criteria other than least
violent crime in a sample of urban male adolescents. Incre- squares in nonhierarchical clustering, which can reduce
ments in R2 values over intercept-only models determined sensitivity to nonnormality. With small sample sizes,
that the main effects of these five variables accounted for K-means is also sensitive to the order of entry of variables.
6.5% of the variance in property crime and 7.4% of the As noted earlier, different starting values (initial cluster
variance in violent crime. When they entered dummy codes centers) can significantly affect the solutions obtained with
for clusters based on these same five variables into the K-means (Steinley, 2003). These limitations can be over-
model, they could account for an additional 4.1% of the come, to a certain extent, by obtaining cluster centers using
variance in individual property crime and an additional hierarchical clustering and then using these centers as initial
4.5% of the variance in individual violent crime. These centers in a nonhierarchical (K-means) analysis. Two stud-
investigators concluded that the clusters were accounting ies reviewed here used this combination of clustering meth-
for the variance that would have been explained by the ods (Gorman-Smith, Tolan, & Henry, 2000; Mandara &
interactions among the five variables. Murray, 2002).
Third, and most important, no definitive test exists to
Cautions in Applying Cluster Analysis in Family determine whether cluster solutions reveal meaningful clus-
Psychology Research ters or merely clumping that is essentially random. The
nature of clustering itself in multivariate distributions com-
Two decades ago, Aldenderfer and Blashfield (1984, pp. plicates the use of clustering methods in family psychology.
14 –17) offered three notes of caution in using cluster anal- To illustrate what we mean by “the nature of clustering,”
ysis, and they remain important today in family research. Figure 4 shows three points on a continuum of aggregation
First, clustering methods are relatively simple heuristics that in bivariate distributions: (a) aggregated, (b) regularly dis-

Figure 4. Bivariate distributions showing varying degrees of clustering.


130 HENRY, TOLAN, AND GORMAN-SMITH

tributed, and (c) randomly distributed. The random distri- In time, clustering methods may be an important compo-
bution is, as its name suggests, a distribution of uniform nent of statistical methods that help provide an understand-
random numbers between 0 and 1. We created the regular ing of the complexities of family relationships. Clustering
distribution by transforming a random distribution in a has much to offer, although it is still rarely applied and is
manner that decreased its clustering. We created the aggre- not well accepted by some in the field as a strong statistical
gated distribution by transforming a random distribution to method. Clustering fills an important void in linear statisti-
increase its degree of clustering. Figure 4 illustrates that cal methods, which cannot account for subsamples or the
random data are not regularly or uniformly spaced but have natural clusters that occur in data. Some of the evidence
some degree of clustering (Cottam, Curtis, & Cantana, reviewed here suggests that clustering studies may reveal
1957). Random data from a normal distribution, such as that more general groupings in samples that vary in nationality,
expected in many psychological measures, show an even ethnicity, and residence, as well as methods and measures.
higher degree of clustering than do these random data from Recent advances have improved clustering methods and
a uniform distribution. have added and refined methods for determining an appro-
The degree of clustering normally present in random data priate number of clusters and interpreting cluster solutions.
complicates the task of differentiating meaningful natural These advancements, along with improved measures of
clusters from random clustering. Adding to this complica- family characteristics, should enhance the role of clustering
tion is the fact that psychological data do not tend to be methods in family research.
tightly clustered. That is, clustered distributions, when they
do occur in psychology, tend to have fairly subtle degrees of
2
clustering, areas of slightly greater density rather than the Vermunt (1996) introduced a latent class approach to cluster-
clear groups seen in the first panel of Figure 4. Clustering ing that can be used with categorical, continuous, ordinal, or mixed
methods are not sensitive to the difference between cluster- data types and does not require rescaling variables. It provides
ing found in random distributions and that found when real chi-square, Bayesian information criterion, and similar statistics
clusters exist. Although no definitive test exists to determine that allow comparisons between different cluster solutions. This
method is available in commercially marketed (Vermunt & Magid-
whether true clustering is present in data (Mandara, 2003), son, 2003; online at www.latentgold.com) and free downloadable
several existing and emerging methods, when used together, packages (www.uvt.nl/faculteiten/fsw/organisatie/departementen/
can increase one’s confidence in the appropriateness of mto/software2.html).
cluster analysis (Breckenridge, 2000; Mandara, 2003). In
addition, newer programs such as M-Plus (L. K. Muthén & References
Muthén, 1998) and Latent Gold (Vermunt & Magidson,
2003)2 conduct latent class analysis (McCutcheon, 1987)) Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis.
or latent class clustering and provide fit statistics that can Beverly Hills, CA: Sage.
Arabie, P., & Hubert, L. J. (1992). Combinatorial data analysis.
aid in determining how many clusters or classes to retain.
Annual Review of Psychology, 43, 169 –203.
Bateson, G., Jackson, D. D., Haley, J., & Weakland, J. (1956).
Conclusion Towards a theory of schizophrenia. Behavioral Science, 1, 251–
264.
We have reviewed the steps in conducting a cluster Baumrind, D. (1991). The influence of parenting style on adoles-
analysis and have reviewed the most commonly used clus- cent competence and substance use. Journal of Early Adoles-
tering methods in family research. We have reviewed the cence, 11, 56 –95.
uses of clustering in the family psychology literature during Baxter, L. A., Braithwaite, D. O., & Nicholson, J. H. (1999).
the past 5 years and have discussed some notes of caution Turning points in the development of blended families. Journal
for using clustering methods in family research. The field of of Social and Personal Relationships, 16, 291–313.
Beale, E. M. L. (1969). Euclidian cluster analysis. Bulletin of the
clustering, classification, and taxonometrics is vast, varied,
International Statistical Institute, 43, 92–94.
and rapidly growing. There are other clustering methods Belsky, J., & Hsieh, K. (1998). Patterns of marital change during
(e.g., see Meehl, 1992; Stutz & Cheeseman, 1995) and the early childhood years: Parent personality, coparenting, and
potential uses for clustering, such as detection of multivar- division-of-labor correlates. Journal of Family Psychology, 12,
iate outliers (Rocke, 2002) that we did not review here. New 511–528.
methods are emerging to determine appropriate numbers of Bergman, L. R. (1996). Studying persons-as-wholes in applied
clusters and to validate cluster solutions (Breckenridge, research. Applied Psychology: An International Review, 45, 331–
2000; Mandara, 2003) that may, in time, be implemented in 334.
statistical packages. Family psychology research would Bergman, L., & Magnusson, D. (1997). Person-oriented research
benefit from conversation with other sciences that have in developmental psychopathology. Development and Psychopa-
thology, 9, 291–319.
longer histories of using clustering methods, such as for-
Bowen, M. (1960). A family concept of schizophrenia. In D. D.
estry, ecology, or epidemiology (Hartigan, 1975). Such Jackson (Ed.), The etiology of schizophrenia (pp. 346 –372). New
dialogue may lead to further refinements of clustering meth- York: Basic Books.
ods, although cluster analysis in other disciplines necessar- Bray, J. H., Maxwell, S. E., & Cole, D. (1995). Multivariate
ily reflects the particular needs and biases of those statistics for family psychology research. Journal of Family
disciplines. Psychology, 9, 144 –160.
SPECIAL ISSUE: CLUSTER ANALYSIS 131

Breckenridge, J. N. (2000). Validating cluster analysis: Consistent Hamid, P. N., Yue, X. D., & Leung, C. M. (2003). Adolescent
replication and symmetry. Multivariate Behavioral Research, 35, coping in different Chinese family environments. Adolescence,
261–285. 38, 111–130.
Brenner, V., & Fox, R. A. (1999). An empirically-derived classi- Hartigan, J. A. (1975). Clustering algorithms. New York: Wiley.
fication of parenting practices. Journal of Genetic Psychology, Hawkins, A. J., Marshall, C. M., & Allen, S. M. (1998). The
160, 343–356. Orientation Toward Domestic Labor Questionnaire: Exploring
Cairns, R. B., Bergman, L. R., & Kagan, J. (Eds.). (1998). Methods dual-earner wives’ sense of fairness about family work. Journal
and models for studying the individual. Thousand Oaks, CA: of Family Psychology, 12, 244 –258.
Sage. Henry, D., Tolan, P. H., & Gorman-Smith, D. (2001). Longitudinal
Collins, L. M., Hyatt, S. L., & Graham, J. W. (2000). LTA as a family and peer group effects on violence and nonviolent delin-
way of testing models of stage-sequential change in longitudinal quency. Journal of Clinical Child Psychology, 30, 172–186.
data. In T. D. Little, K. U. Schnabel, & J. Baumert (Eds.), Hochschild, A. (1989). The second shift: Working parents and the
Modeling longitudinal and multiple-group data: Practical is- revolution at home. New York: Viking Press.
sues, applied approaches, and specific examples (pp. 147–161). Jaccard, P. (1908). Nouvelles recherches sur la distribution florale
Hillsdale, NJ: Erlbaum. [New research on the floral distribution]. Bulletin sur le Societe
Collins, L. M., Lanza, S. T., Schafer, J. L., & Flaherty, B. P. Vaudoise Sciencia Naturale, 44, 223–270.
(2002). WinLTA user’s guide. University Park: Methodology Jones, B. L., Nagin, D. S., & Roeder, K. (2001). A SAS procedure
Center, Pennsylvania State University. based on mixture models for estimating developmental trajecto-
Cordingley, L., Wearden, A., Appleby, L., & Fisher, L. (2001). ries. Sociological Methods and Research, 29, 374 –393.
The Family Response Questionnaire: A new scale to assess the Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data:
responses of family members to people with chronic fatigue An introduction to cluster analysis. New York: Wiley.
syndrome. Journal of Psychosomatic Research, 51, 417– 424. Lee, J. W., Rice, G. T., & Gillespie, V. B. (1997). Family worship
Cottam, G., Curtis, J. T., & Cantana, A. J., Jr. (1957). Some patterns and their correlation with adolescent behavior and be-
sampling characteristics of a series of aggregated populations. liefs. Journal for the Scientific Study of Religion, 36, 372–381.
Ecology, 38, 610 – 622. Loeber, R., & Stouthamer-Loeber, M. (1986). Family factors as
Danseco, E. R., & Holden, E. W. (1998). Are there different types correlates and predictors of juvenile conduct problems and de-
of homeless families? A typology of homeless families based on linquency. In M. Tonry & N. Morris (Eds.), Crime and justice
cluster analysis. Family Relations: Journal of Applied Family (pp. 29 –149). Chicago: University of Chicago Press.
and Child Studies, 47, 159 –165. Maccoby, E. E., & Martin, J. (1983). Socialization in the context
Darling, N., & Steinberg, L. (1993). Parenting style as context: An of the family: Parent-child interaction. In P. H. Mussen (Series
integrative model. Psychological Bulletin, 113, 487– 496. Ed.) & E. M. Hetherington (Vol. Ed.), Handbook of child psy-
Dromi, E., & Ingber, S. (1999). Israeli mothers’ expectations from chology: Vol. 4. Socialization, personality, and social develop-
early intervention with their preschool deaf children. Journal of ment (4th ed., pp. 1–101). New York: Wiley.
Deaf Studies and Deaf Education, 4, 50 – 68. MacQueen, J. B. (1967). Some methods for classification and
Fisher, L., & Ransom, D. C. (1995). An empirically derived analysis of multivariate observations. In L. M. Le Cam & J.
typology of families: I. Relationships with adult health. Family Neyman (Eds.), Proceedings of the 5th Berkeley Symposium on
Process, 34, 161–182. Mathematical Statistics (Vol. 1, pp. 281–297). Berkeley: Uni-
Fox, R. A. (1994). Parenting Behavior Checklist manual. Austin, versity of California Press.
TX: Pro-Ed. Mandara, J. (2003). The typological approach in child and family
Gorman-Smith, D., Tolan, P. H., & Henry, D. B. (1999a, Septem- psychology: A review of theory, methods, and research. Clinical
ber). Patterns of coupling among late adolescent youth: Path- Child and Family Psychology Review, 6, 129 –146.
ways to risk for relationship aggression. Paper presented at the Mandara, J., & Murray, C. B. (2002). Development of an empirical
annual meeting of the Life History Research Society, Kauai, HI. typology of African American family functioning. Journal of
Gorman-Smith, D., Tolan, P. H., & Henry, D. (1999b). The rela- Family Psychology, 16, 318 –337.
tion of community and family to risk among urban-poor adoles- McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of
cents. In P. Cohen, C. Slomkowski, & L. Robbins (Eds.), Where detecting interactions and moderator effects. Psychological Bul-
and when: Influence of historical time and place on aspects of letin, 114, 376 –390.
psychopathology (pp. 349 –367). Hillsdale, NJ: Erlbaum. McCutcheon, A. R. (1987). Latent class analysis. Beverly Hills,
Gorman-Smith, D., Tolan, P. H., & Henry, D. B. (2000). A CA: Sage.
developmental-ecological model of the relation of family func- Meehl, P. E. (1992). Factors, taxa, traits, and types, differences of
tioning to patterns of delinquency. Journal of Quantitative Crim- degree and differences in kind. Journal of Personality, 60, 117–
inology, 16, 169 –198. 174.
Gorman-Smith, D., Tolan, P. H., Henry, D. B., & Florsheim, P. Milligan, G. W. (1996). Clustering validation: Results and impli-
(2000). Patterns of family functioning and adolescent outcomes cations for applied analyses. In P. Arabie, L. J. Hubert, & G. De
among urban African American and Mexican American families. Soto (Eds.), Clustering and classification (pp. 341–375). River
Journal of Family Psychology, 14, 436 – 457. Edge, NJ: World Scientific.
Gorman-Smith, D., Tolan, P., Loeber, R., & Henry, D. (1998). The Milligan, G. W., & Cooper, M. C. (1988). A study of standard-
relation of family problems to patterns of delinquent involvement ization of variables in cluster analysis. Journal of Classification,
among urban youth. Journal of Abnormal Child Psychology, 26, 5, 182–204.
319 –333. Moos, R. H., & Moos, B. S. (1986). Family Environment Scale
Haley, J. (1971). Family therapy: A radical change. In J. Haley manual. Palo Alto, CA: Consulting Psychologists Press.
(Ed.), Changing families: A family therapy reader (pp. 272–284). Morgan, B. T., & Ray, A. P. G. (1995). Non-uniqueness and
New York: Grune & Stratton. inversion in cluster analysis. Applied Statistics, 44, 117–134.
132 HENRY, TOLAN, AND GORMAN-SMITH

Muthén, B. O., & Muthén, L. K. (2000). Integrating person- mum entropy and Bayesian methods (pp. 117–126). Boston:
centered and variable-centered analysis: Growth mixture model- Kluwer.
ing with latent trajectory classes. Alcoholism: Clinical and Ex- Tabachnik, B. G., & Fidell, L. S. (1989). Using multivariate
perimental Research, 24, 882– 891. statistics. New York: Harper & Row.
Muthén, L. K., & Muthén, B. O. (1998). Mplus user’s guide. Los Vermunt, J. K. (1996). Causal log-linear modeling with latent
Angeles: Muthén & Muthén. variables and missing data. In U. Engel & J. Reincke (Eds.),
Nagin, D. S. (1999). Analyzing developmental trajectories: A Analysis of change: Advanced techniques in panel data (pp.
semi-parametric, group-based approach. Psychological Methods, 35– 60). New York: Walter de Gruyter.
4, 139 –177. Vermunt, J. K., & Magidson, J. (2003). Latent class models for
Peacock, M. J., Murray, C. B., Ozer, D., & Stokes, J. (1996). The classification. Computational statistics and data analysis, 41,
development of the Black Family Process Q-Sort. In R. Jones 531–537.
(Ed.), Handbook of tests and measurements for Black popula- Ward, J. H. (1963). Hierarchical grouping to optimize an objective
tions (Vol. 4, pp. 475– 493). Hampton, VA: Cobb & Henry. function. Journal of the American Statistical Association, 58,
236 –244.
QSR International. (2004). N6: The sixth version of the world
Wishart, D. (1969). An algorithm for hierarchical classifications.
leading NUD-IST software. Retrieved June 4, 2004, from http://
Biometrics, 25, 165–170.
www.qsr.com.au/
Wishart, D. (1982). Supplement, CLUSTAN user manual (3rd ed.).
Rocke, D. (2002, May). Multivariate outlier detection and cluster Edinburgh, Scotland: Program Library Unit, Edinburgh
identification. Paper presented at the International Conference on University.
Robust Statistics, Davis, CA. Yzerbyt, V. Y., Muller, D., & Judd, C. M. (2004). Adjusting
Seber, G. A. F. (1984). Multivariate observations. New York: researchers’ approach to adjustment: On the use of covariates
Wiley. when testing interactions. Journal of Experimental Social Psy-
Steinley, D. (2003). Local optima in K-means clustering: What chology, 40, 424 – 431.
you don’t know may hurt you. Psychological Methods, 8, 294 –
304. Received February 15, 2004
Stutz, J., & Cheeseman, P. (1995). AutoClass: A Bayesian ap- Revision received August 30, 2004
proach to classification. In J. Skilling & S. Sibisi (Eds.), Maxi- Accepted October 5, 2004 䡲

You might also like