Supplemental Material can be found at:
The Journal of Nutrition
Methodology and Mathematical Modeling

Factor Analysis Is More Appropriate to Identify

Overall Dietary Patterns Associated with
Diabetes When Compared with Treelet
Transform Analysis1–3
Danielle A. J. M. Schoenaker,4 Annette J. Dobson,4 Sabita S. Soedamah-Muthu,5 and Gita D. Mishra4*
School of Population Health, University of Queensland, Brisbane, Queensland, Australia; and 5Division of Human Nutrition,
Wageningen University, Wageningen, The Netherlands

Treelet transform (TT) is a proposed alternative to factor analysis for deriving dietary patterns. Before applying this method

Downloaded from by guest on November 14, 2017

to nutrition data, further analyses are required to assess its validity in nutritional epidemiology. We aimed to compare
dietary patterns from factor analysis and TT and their associations with diabetes incidence. Complete data were available
for 7349 women (50–55 y at baseline) from the Australian Longitudinal Study on WomenÕs Health. Exploratory factor
analysis and TT were performed to obtain patterns by using dietary data collected from an FFQ. Generalized estimating
equations analyses were used to examine associations between dietary patterns and diabetes incidence. Two patterns
were identified by both methods: a prudent and a Western dietary pattern. Factor analysis factors are a linear combination
of all food items, whereas TT factors also include items with zero loading. The Western pattern identified by factor analysis
showed a significant positive association with diabetes [highest quintile: OR = 1.94 (95% CI: 1.25, 3.00); P-trend = 0.001).
Both factor analysis and TT involve different assumptions and subjective decisions. TT produces clearly interpretable
factors accounting for almost as much variance as factors from factor analysis. However, TT patterns include food items
with zero loading and therefore do not represent overall dietary patterns. The different dietary pattern loading structures
identified by both methods result in different conclusions regarding the relationship with diabetes. Results from this study
indicate that factor analysis might be a more appropriate method for identifying overall dietary patterns associated with
diabetes compared with TT. J. Nutr. 143: 392–398, 2013.

Type 2 diabetes is a growing public health problem; never-
A diet consists of a variety of foods with complex combinations theless, results from the NursesÕ Health Study (1) show that the
of nutrients that are likely to interact. A way to examine the joint majority of cases could be avoided by behavior modification,
effect of food intakes and capture overall diet is to derive dietary
including maintaining a diet high in fiber and low in saturated
patterns using appropriate statistical methods. The identification
and trans fat and glycemic load. A great deal of epidemiological
of dietary patterns offers a comprehensive approach to study
and clinical research on the role of diet and diabetes has resulted
eating habits and makes it possible to examine the relations with
disease risk in order to propose well-grounded dietary guide- in a considerable body of evidence relating specific dietary pat-
lines. terns to the risk of diabetes (2–4). However, studies have used
different approaches for identifying dietary patterns. Two gen-
eral approaches have been used in observational studies: a prio-
1 ri methods, where nutritional variables are grouped according
The Australian Longitudinal Study on WomenÕs Health, which was conceived
and developed by groups of interdisciplinary researchers at the Universities of to prior knowledge or theory of a healthy diet (5,6), and a
Newcastle and Queensland, is funded by the Australian Government posteriori methods, where dietary patterns are derived from
Department of Health and Ageing. Gita D. Mishra is supported by the statistical modeling of dietary data allowing for hypothesis-
Australian National Health and Medical Research Council Centre for Research
Excellence in WomenÕs Health. generating analyses.
Author disclosures: D. A. J. M. Schoenaker, A. J. Dobson, S. S. Factor analysis is a widely used a posteriori method to
Soedamah-Muthu, and G. D. Mishra, no conflicts of interest. identify dietary patterns (7). The use of factor analysis, however,
Supplemental Table 1 is available from the ‘‘Online Supporting Material’’ link in remains controversial in the field of nutritional epidemiology
the online posting of the article and from the same link in the online table of
contents at
because of subjective choices made throughout the analytical
* To whom correspondence should be addressed. E-mail: process. Examples are pregrouping of original food items prior
au. to analysis, choice of the number of factors to extract, the
ã 2013 American Society for Nutrition.
392 Manuscript received September 4, 2012. Initial review completed October 9, 2012. Revision accepted November 30, 2012.
First published online January 23, 2013; doi:10.3945/jn.112.169011.
TABLE 1 Factor analysis and TT for identifying dietary patterns: features/aims, assumptions and
decisions associated with each method

Factor analysis TT

Features/aims Uses the correlation matrix of food Uses the correlation matrix of food
items to derive dietary patterns items to derive dietary patterns
Factor loadings on each food item Factor loadings generally on only a
are used to identify important few food items are used to identify
foods contributing to each dietary important foods contributing to each
pattern to extract maximum variance dietary pattern
Provides a hierarchical
cluster tree to visually identify
dietary patterns
Assumptions Each factor is a linear combination Sparsity: factors contain only a few
of all food items to capture overall food items (omitting other food items
diet by giving them zero loadings)
Decisions Pregrouping of original food items Pregrouping of original food items
prior to analysis prior to analysis
The number of factors to extract The number of factors to extract
The method of rotation Determining the optimal cut-level for
the cluster tree
Labeling of factors Labeling of factors

Downloaded from by guest on November 14, 2017

TT, Treelet transform.

method of rotation, and labeling of factors (8,9). Furthermore, study of factors affecting the health and well-being of 3 cohorts of
that each factor is a linear combination of all original food items Australian women born in 1973–1978 (‘‘young’’), 1946–1951 (‘‘mid-
may make interpretation complicated. age’’), and 1921–1926 (‘‘older’’). Women were randomly selected from
New estimation methods have been proposed to overcome the national Medicare health insurance database, which includes all
Australian citizens and permanent residents. Women from rural and
these limitations and provide better insight into diet and disease
remote areas were intentionally oversampled (12). Since 1996 surveys
etiology. Recently, the Treelet transform (TT)6, developed by Lee have been administered to each cohort every 2–4 y on a rolling basis.
et al. (10), was suggested to overcome some of the limitations of Further details of the recruitment methods and response have been
factor analysis mainly by improving the interpretation of factors described elsewhere (13). Informed consent was obtained from all
(11). Gorst-Rasmussen et al. (11) compared TT and Procrustes- participants at each survey, with ethical clearance obtained from the
rotated principal component analysis (PCA) as explanatory Human Research Ethics Committees of the University of Newcastle and
methods to study dietary patterns and the risk of myocardial the University of Queensland.
infarction in middle-aged men in a Danish prospective cohort
study. Risk estimates were not comparable with those obtained Participants and surveys. The present study focuses on women in the
using PCA, even though they found that TT factors were easier mid-age cohort. In 1996, 13,715 women aged 45–50 y participated in
the baseline survey (survey 1). This was estimated to be a 53–56%
to interpret due to the graphical representation of the clustering
response rate for this age cohort (12). Diabetes was assessed at every
of food items and the limited number of food items with a factor survey and dietary intake was first assessed at the third survey (S3). From
loading. The authors concluded that TT may be a useful al- the initial mid-age cohort, 11,226 women aged 50–55 y in 2001
ternative to factor analysis (11). To further assess the validity of this completed S3. This study further includes women during follow-up who
approach in nutritional epidemiology, however, comparisons are responded to the fourth, fifth, and sixth surveys in 2004 (S4, n = 10,905),
needed with other methods of determining dietary patterns and 2007 (S5, n = 10,638), and 2010 (S6, n = 9748), respectively. Attrition
with the resultant associations for a range of health outcomes. occurred mainly due to participants not returning the survey or inability
Therefore, the aim of the present study was to compare to contact the participant (14). Percentages of women deceased between
dietary patterns derived by factor analysis, a widely used surveys are 0.4% at S2, 0.5% at S3, 0.8% at S4, 0.7% at S5, and 0.8% at
method, and the proposed alternative, TT. Our second aim was S6. Women with history of type 1 or 2 diabetes or impaired glucose
tolerance (n = 745) or a history of cardiovascular disease (n = 703) before
to compare the associations between these dietary patterns and
or at S3, or with incomplete dietary data at S3 (n = 1627) were excluded;
incidence of diabetes. Associations between dietary patterns and the data of 8065 participants were used for obtaining dietary patterns.
diabetes have been extensively studied in literature (2–4), which Those with missing data on covariates (n = 716) were then excluded,
is ideal for critical evaluation of a new proposed analysis leaving complete data of 7349 women for analysis of the associations
method. between dietary patterns and incident diabetes.

Dietary intake. At S3, diet was assessed using an FFQ: the Dietary
Participants and Methods Questionnaire for Epidemiological Studies version 2. The development
The Australian Longitudinal Study on WomenÕs Health. The of the questionnaire (15) and its validation were previously reported
Australian Longitudinal Study on WomenÕs Health is a prospective (16). A total of 63 women completed 7-d weighted food records next to
the FFQ. Nutrient intakes were compared and deattenuated correlations
corrected for daily variation in nutrient intake ranged between 0.28 for
Abbreviations used: MET, total metabolic equivalent; PCA, principal compo- total vitamin A and 0.78 for carbohydrate after energy adjustment,
nent analysis; S1, basline survey; S2, S3, etc., second, third, etc. survey; TT, indicating that the FFQ was useful for assessing habitual intake (16).
Treelet transform. Participants were asked to report their usual frequency of consumption

Comparing methods for dietary pattern analysis 393

of 74 food and 6 alcoholic beverage items during the previous 12 mo all original food items were included in the hierarchical cluster tree. This
using 10 response options ranging from ‘‘never’’ to ‘‘three or more times cluster tree was ‘‘cut’’ at a given level to provide high variance factors
per day.’’ The FFQ additionally included 21 items about the number of that indicate related groups of food items that describe an underlying
servings and type of milk, bread, fat spread, sugar, eggs, and cheese factor (10). More variation can be explained at the cost of sparsity when
consumed. Ten items on nonalcoholic beverages with similar response the cluster tree is cut near its roots. We determined an optimal cut-level
options were also included. For clear visualization of features of factor by using 10-fold cross-validation (10). To assess sensitivity of the choice
analysis and TT, a total number of 111 food items was aggregated into of cut-level, TT analyses were repeated at 63 levels of the optimal level
40 groups based on similarity of nutrient profiles or culinary usage (11).
among the foods for dietary pattern analysis (Supplemental Table 1). For both methods, the number of factors that best represented the
Some original food items were preserved either because it was inappro- data was based on eigenvalues $1.25, identification of a break point in
priate to aggregate them into a particular food group (peanut butter, the screeplot, and interpretability (25). For each subject, factor scores
eggs, coffee, and sugar) or because of the possibility of the items to were calculated for each of the retained factors by summing the
represent distinct patterns (potatoes with fat, potatoes without fat, white frequency of consumption multiplied by factor loadings across all food
bread, nuts, tomato sauce, and flavored milk). All responses were items with >0 loading. Pearson correlation coefficients were obtained
converted to frequency of consumption per day for analysis. Nutrient between dietary patterns derived by factor analysis and TT. The stability
intakes, including total energy and energy from alcohol, were computed of factors was assessed by performing factor analysis and TT in 2
from the national government food composition database of Australian random subsamples. Correlations between factors from the 2 subsam-
foods, the NUTTAB95 (17). ples were used as a stability measure for both methods.
Generalized estimating equations (GEE) analyses with link function
Diabetes incidence. Participants were asked at each survey whether a ÔlogitÕ were used to analyze the longitudinal relationships between
doctor had told them that they had diabetes. At S4 to S6, they were asked quintiles of dietary pattern scores and incidence of diabetes to take into
whether they had been diagnosed with diabetes since the previous survey. account the correlation between repeated measures in the same
Diagnosis of diabetes was not differentiated into type 1 or type 2 at S4 to individual (26). An unstructured correlation matrix was used for
S6 and therefore referred to as diabetes. within-subject correlation. Models were used to produce ORs and

Downloaded from by guest on November 14, 2017

95% CIs for the associations between factors (continuous scores) at S3
Covariates. Lifestyle characteristics available at different surveys were and incidence of diabetes at S4 to S6. Adjustments were made for
used for analysis. Education was only asked at S1 and responses were potential confounders: total energy intake (including energy from
categorized as: no formal qualifications, school or intermediate certif-
icate, higher school certificate, trade/certificate, or university degree.
Smoking status was available at S3 to S6 and defined as: never smoked,
ex-smoker, smoker (<10 cigarettes/d), smoker (10–19 cigarettes/d), or TABLE 2 Baseline characteristics (S3) of 7349 middle-aged
smoker ($20 cigarettes/d). Participants were asked about alcohol study participants in The Australian Longitudinal Study on
consumption at S3 to S6, which was classified according to the National WomenÕs Health1
Health and Medical Research Council classifications as: nondrinker,
low-risk drinker (#14 drinks/wk), or risky drinker (>14 drinks/wk) (18). Characteristic Value
Physical activity scores obtained at S3 to S6 were derived from self-
reported frequency and duration of walking (for recreation or transport) Age, y 53 6 1.5
and from moderate- and vigorous-intensity activity in the last week. The Total energy intake, kJ/d 6960 6 2390
total metabolic equivalent (MET) in minutes/week was calculated as BMI
(walking minutes 3 3.5) + (moderate minutes 3 4) + (vigorous minutes 3 Underweight, BMI ,18.5 kg/m2 87 (1.2)
7.5) (19–21). Physical activity was then categorized as: sedentary (0 to Healthy weight, BMI 18.5– ,25 kg/m2 2896 (39.4)
<40 MET min/wk), low (40 to <600 MET min/wk), moderate (600 to Overweight, BMI 25– ,30 kg/m2 2499 (34.0)
<1200 MET minutes/wk), or high ($1200 MET min/wk). Participants
Obese, BMI $30 kg/m2 1867 (25.4)
were asked at S3 to S6 whether they had been diagnosed with or treated
History of hypertension 1837 (25.0)
for hypertension in the last 3 y, with responses categorized as ‘‘yes’’ or
‘‘no.’’ BMI, available at S3 to S6, was computed as self-reported weight Physical activity
(kg)/height (m2) and categorized as: underweight (BMI <18.5 kg/m2), Sedentary, 0 to ,40 MET min/wk 1110 (15.1)
healthy weight (BMI 18.5– <25 kg/m2), overweight (BMI 25– <30 kg/m2), Low, 40 to ,600 MET min/wk 2058 (28.0)
or obese (BMI $30 kg/m2) according to the WHO classification (22). Moderate, 600 to ,1200 MET min/wk 1631 (22.2)
High, $1200 MET min/wk 2550 (34.7)
Statistical analyses. Dietary patterns at S3 were identified by both Smoking status
exploratory factor analysis and TT for 8065 women. These dimension- Never smoked 4049 (55.1)
reduction methods are comparable, because they aggregate food items Ex-smoker 2322 (31.6)
based on correlation; however, they involve different aims, assumptions, Smoke ,10 cigarettes/d 331 (4.5)
and decisions to be made (Table 1). Smoke 10–19 cigarettes/d 272 (3.7)
Factor analysis aims to identify patterns with a linear composition of
Smoke $20 cigarettes/d 375 (5.1)
all food items that account for the largest amount of variation in diet
Education (S1)
between individuals. Factors obtained were rotated using the (orthog-
onal) Varimax procedure to facilitate interpretability and ensure No formal qualifications 1014 (13.8)
orthogonality (23). To enhance comparability with TT factors that are School or Intermediate certificate 2337 (31.8)
correlated, (oblique) Procrustes-rotated factor analysis was also per- Higher school certificate 1242 (16.9)
formed in secondary analysis (24). Trade/apprenticeship/certificate/diploma 1507 (20.5)
TT aims to explain variation with factor simplicity by introducing University/higher degree 1249 (17.0)
sparsity in factor loadings (i.e., the number of food items with zero Alcohol consumption
loading). TT produces sparse factors as well as a cluster tree to visualize Nondrinker 1279 (17.4)
the grouping structure among food items. Treelets were constructed as Low risk drinker (#14 drinks/wk) 5982 (81.4)
follows: first, the 2 original food items with the highest correlation were
High risk drinker (.14 drinks/wk) 88 (1.2)
grouped together and local PCA was performed. These 2 items were
then replaced by a sum variable, which was retained, and a difference 1
Data are mean 6 SD or n (%). MET, total metabolic equivalent; S1, baseline survey;
variable, which was disregarded (11). This algorithm was repeated until S3, third survey (baseline for the present study).

394 Schoenaker et al.

alcohol) at S3 (kJ/d), education at S1 (5 categories), smoking status at S3 TT analyses at cut-levels 21 and 27 (24 6 3) produced similar
to S6 (5 categories), alcohol consumption at S3 to S6 (3 categories), factors to those described but with slightly different loadings.
physical activity at S3 to S6 (4 categories), and (continuous) factor scores Compared with the factor obtained using cut-level 24, the
of other dietary patterns at S3. The potential effect modifiers hyperten- prudent pattern included zero loadings for high-fiber bread and
sion at S3 to S6 (yes or no) and BMI at S3 to S6 (4 categories) were
tea and water when obtained using cut-level 21.
adjusted for in separate models 2 and 3, respectively. RRs are expressed
per 1 SD of the factor scores. Statistical analysis was conducted using
Results from stability analyses indicate stability for the
Stata version 11.1 (StataCorp). For TT, an add-on for Stata was used prudent pattern from both methods (Pearson correlation
(27). P values < 0.05 were considered significant. between factors from 2 subsamples: r = 0.98, P = <0.0001 for
factor analysis and r = 0.97, P = <0.0001 for TT). The Western
patterns obtained from factor analysis in the 2 subsamples
Results correlated highly (Pearson correlation r = 0.97; P = <0.0001),
Dietary patterns. Characteristics of the women are described in whereas the Western pattern from TT seemed less stable
Table 2. Three factors for factor analysis and 2 factors for TT [Pearson correlation r = 0.84; P = <0.0001)].
had a variance $1.25. Also taking the screeplot and interpret-
ability of factors into account, 2 factors were obtained from Associations between dietary patterns and incidence of
both factor analysis and TT. Results from cross-validation for diabetes. During 9 y of follow-up (S3 to S6), 635 incident
TT indicated an optimal cut-level for the cluster tree of 24. diabetes cases occurred among 7346 participants. The Western
Factors from factor analysis were labeled based on high factor pattern obtained from factor analysis was positively associated
loadings ($0.20) and are presented in Figure 1A (prudent with incidence of diabetes after adjustments were made for
patterns) and 1B (Western patterns). For TT, nonzero loadings energy intake and lifestyle factors [highest quintile: OR = 1.94
were used for labeling (Fig. 1A,B, right) together with the cluster (95% CI: 1.25, 3.00); P-trend: 0.001] (Table 3). This association
tree (Fig. 2). The percentage of variance explained by the 2 was hardly affected by additional adjustment for hypertension

Downloaded from by guest on November 14, 2017

factors was almost similar for factor analysis and TT (19.1 and or BMI. No significant associations were found for the prudent
18.6%, respectively). patterns and the Western pattern identified by TT in relation to
The structure of factor 1 was comparable between factor incidence of diabetes. Including factor scores of the other dietary
analysis (Fig. 1A,B, left) and TT (Fig. 1A,B, right) (Pearson pattern in the models did not change our results after adjusting
correlation r = 0.99; P = <0.0001) and primarily characterized for energy intake and lifestyle factors.
by fruit, vegetables, high-fiber bread, medium-fat dairy, and fish.
Factor 1 was labeled as a prudent pattern (Fig. 1A). Factor 2
showed high loadings on take-away food, meat, and snacks for
both factor analysis and TT and was labeled as a Western pattern In our study of dietary patterns and incidence of diabetes among
(Fig. 1B) (Pearson correlation r = 0.70; P = <0.0001). Factor 2 Australian middle-aged women, we found that 2 dietary
from factor analysis also included high loadings on high-fat patterns could be identified by factor analysis and TT, explaining
dairy, potatoes with fat, and white bread. The correlation an almost similar amount of variation. Similar labels were given
between scores from factor analysis and TT factor 2 was 0.72. to both patterns; however, they differ regarding factor loadings
Factors derived using (oblique) Procrustes-rotated factor anal- and number of items with a loading. Therefore, relating these
ysis were similar to those derived using (orthogonal) Varimax- patterns to incidence of diabetes resulted in different conclusions
rotated factor analysis (data not shown). from the different methods. Our results from factor analysis

FIGURE 1 Factor loadings for factors derived by factor analysis and TT for prudent patterns (A) and Western patterns (B) for participants in The
Australian Longitudinal Study on WomenÕs Health (n = 8065). TT, Treelet transform.

Comparing methods for dietary pattern analysis 395

Strengths of the present study include the large sample size,
the longitudinal design, the detailed information on lifestyle
factors, and the use of a validated FFQ (16). A limitation in our
study is the reliance on self-reported diabetes. Additionally,
measurement error is a concern in self-reported dietary assess-
ment. Although these limitations are generally thought to be
nondifferential with regard to the outcome, they may have
influenced the magnitude and the direction of the associations.
Also, the amount of missing data is a constraint in our study.
However, the distribution of lifestyle characteristics was similar
for participants included for the present study and participants
excluded for the present study due to missing data, except for
hypertension and physical activity (data now shown). Partici-
pants excluded from the present study were more likely to have
hypertension compared with participants included (28.8 vs.
25.0% at S3, respectively) and to be less physically active (24.9
vs. 34.7% reported a high physical activity level at S3,
respectively). Another limitation is the possibility of residual
confounding resulting from measurement errors of confounders
or factors we did not measure.
In our study, we identified 2 dietary patterns from Varimax-
rotated factor analysis and TT with Pearson correlations of 0.99

Downloaded from by guest on November 14, 2017

and 0.72 between the 2 methods for factor 1 and 2, respectively.
Gorst-Rasmussen et al. (11) found in their study 7 patterns
derived using Procrusted-rotated PCA and TT with correlations
ranging from 0.48 to 0.99. In this study the percentage of
variance explained by factors from PCA was 36.9 and 31.0%
for TT factors (11), while factors obtained in our study explai-
ned an almost similar amount of percentage of variance from
both methods (19.1% for factor analysis vs. 18.6% for TT).
Gorst-Rasmussen et al. (11) showed in their study that associ-
ations between patterns derived by the 2 methods and risk of
myocardial infarction were not identical, which is in line with
our findings on the association with incidence of diabetes, which
were not identical from patterns derived using the 2 methods.
The selected number of food items contributing to dietary
patterns from TT did not result in significant associations with
FIGURE 2 Cluster tree produced by TT to identify dietary patterns in diabetes incidence, whereas the contribution of all food items to
participants in The Australian Longitudinal Study on WomenÕs Health. dietary patterns identified using factor analysis resulted in
The dashed line indicates the selected cut-level (level 24) for the significant associations with diabetes, which is in line with the
cluster tree (n = 8065). TT, Treelet transform. literature (2–4).
We demonstrated that TT may have several advantages over
show evidence for an inverse association between the prudent factor analysis. First, the key properties of TT are the production
pattern and incidence of diabetes. The Western pattern identified of sparse factors and a cluster tree to visually identify related
by factor analysis was positively associated with diabetes. groups of food items. The complex loading pattern of factor
The majority of studies that have examined dietary patterns analysis factors makes identification of a pattern challenging,
in relation to diabetes have revealed 2 primary patterns, mostly whereas sparse loadings together with the cluster tree provide
labeled as a prudent and a Western pattern (2–4). The prudent simpler characterization of a pattern from TT. To achieve a
pattern, often characterized by high intakes of fruit, vegetables, simple structure of factors from factor analysis and produce
fish, whole grains, and low-fat dairy products, has an inverse factors that are easy to interpret, rotation of the factor loadings
association with diabetes (2–4). These food items load highly on is a commonly used method (23). However, the choice of
our prudent patterns, which had a significant inverse association approach (orthogonal or oblique) is arbitrary, with possible
with diabetes when identified by factor analysis. The pattern impact on the final conclusion. To explore the impact of the
referred to as Western in many studies consists of high con- rotation method used on the final results or conclusions, both
sumption of red meat, processed meat, refined grains, high-fat rotation methods could be used in sensitivity analysis. In our
dairy products, high-sugar drinks, and sweets and is positively study, using orthogonal or oblique rotation did not affect the
associated with diabetes risk (2–4). This is in line with our dietary patterns identified and their associations with diabetes.
results from factor analysis: we showed a positive association for We used Varimax rotation to obtain orthogonal factors, which
the Western pattern with incidence of diabetes. Even though provides nearly uncorrelated, distinct factors that can be related
caution should be taken when comparing results because of the to risk of disease. TT has the advantage that it is able to
highly heterogeneous nature of dietary pattern analyses and risk automatically produce the factors. Also, from a dimension
of chronic diseases, our findings on dietary patterns identified by reduction perspective, TT performs almost as well as factor
TT in relation with incidence of diabetes are not in line with analysis; even with sparse factors, TT factors captured almost as
consistent findings from existing literature (2–4). much variation as factors identified from factor analysis.
396 Schoenaker et al.
TABLE 3 Incidence of diabetes and dietary patterns identified by factor analysis and TT in participants in The Australian Longitudinal
Study on WomenÕs Health (n = 7349)1

Quintiles of dietary pattern scores per 1 SD

1 2 3 4 5 P-trend

Factor analysis: prudent pattern

Model 12 Reference 1.05 (0.74, 1.51) 0.99 (0.68, 1.51) 0.83 (0.56, 1.22) 0.87 (0.59, 1.29) 0.26
Model 23 Reference 1.09 (0.76, 1.55) 1.00 (0.70, 1.44) 0.92 (0.63, 1.34) 0.93 (0.63, 1.37) 0.49
Model 34 Reference 1.14 (0.80, 1.63) 1.05 (0.73, 1.51) 0.89 (0.60, 1.31) 0.94 (0.64, 1.38) 0.40
TT: prudent pattern
Model 12 Reference 1.07 (0.74, 1.55) 0.98 (0.67, 1.43) 0.86 (0.58, 1.27) 0.99 (0.67, 1.47) 0.62
Model 23 Reference 1.06 (0.74, 1.52) 0.98 (0.68, 1.43) 0.92 (0.62, 1.34) 1.06 (0.72, 1.55) 0.94
Model 34 Reference 1.11 (0.77, 1.59) 1.01 (0.70, 1.47) 0.91 (0.62, 1.35) 1.04 (0.71, 1.53) 0.81
Factor analysis: Western pattern
Model 12 Reference 1.17 (0.73, 1.85) 2.04 (1.35, 3.10) 1.86 (1.21, 2.84) 1.94 (1.25, 3.00) 0.001
Model 23 Reference 1.20 (0.76, 1.89) 1.99 (1.32, 3.00) 1.69 (1.11, 2.56) 1.86 (1.21, 2.86) 0.002
Model 34 Reference 1.12 (0.71, 1.77) 1.91 (1.27, 2.89) 1.73 (1.14, 2.64) 1.73 (1.12, 2.67) 0.003
TT: Western pattern
Model 12 Reference 0.82 (0.55, 1.22) 1.16 (0.80, 1.69) 1.18 (0.81, 1.70) 0.83 (0.55, 1.25) 0.97
Model 23 Reference 0.76 (0.51, 1.13) 1.13 (0.79, 1.62) 1.01 (0.70, 1.46) 0.75 (0.50, 1.12) 0.56
Model 34 Reference 0.75 (0.50, 1.12) 1.09 (0.75, 1.57) 1.07 (0.74, 1.54) 0.73 (0.49, 1.10) 0.59

Downloaded from by guest on November 14, 2017

Values are OR and 95% CI. TT, Treelet transform.
Model 1 adjusted for total energy intake (kJ/d), education (no formal qualifications, school or intermediate certificate, higher school certificate, trade/certificate, and university
degree), smoking status (never smoked, ex-smoker, smoke ,10 cigarettes/d, smoke 10–19 cigarettes/d, and smoke $20 cigarettes/d), alcohol consumption [low-risk drinker (#14
drinks/wk), nondrinker, and risky drinker (.14 drinks/wk)], and physical activity (sedentary, low, moderate, and high).
Model 2 adjusted for covariates in model 1 and hypertension (yes, no).
Model 3 adjusted for covariates in model 1 and BMI (underweight, BMI ,18.5; healthy weight, 18.5 # BMI , 25; overweight, 25 # BMI , 30; and obese, BMI $30).

In contrast to factor analysis, TT does not automatically disregarded (11). These foods, however, show a high loading on
produce factors with high variance. A subjective decision is the Western pattern from factor analysis and have strong
made when selecting the cut-level for the cluster tree using cross- individual relationships with diabetes incidence [white bread:
validation before high variance factors can be extracted. The OR = 1.21 (95% CI: 1.12, 1.30); potatoes with fat: OR = 1.69
cut-level influences both the sparsity as well as the grouping of (95% CI: 1.01, 2.75)]. These different pattern structures,
the factors and might therefore affect the results when looking at therefore, result in different conclusions regarding their rela-
associations with disease incidence. Lowering the cut-level tionship with diabetes incidence.
results in increased sparsity, whereas increasing the cut-level In summary, we demonstrated that the proposal of a new
decreases sparsity, showing contributions from all food items approach to derive dietary patterns and comparison of method-
to each factor, comparable to factor analysis. Increasing the ologies gives insight into the importance of aims and assump-
sparsity improves interpretability, but at the same time, the tions in such analyses. Both factor analysis and TT involve
factor variances increase, which might result in unstable results subjective decisions to be made that should be explored in
(28). Performing TT with different cut-levels reveals the insta- sensitivity analyses and taken into account when interpreting
bilities and helps determine the optimal level (11). Pattern results and conclusions for public health messages. Sensitivity
structures remained comparable in our study when obtained analyses on, e.g., pregrouping of food items and number of
using cut-levels of 63; however, further decreasing or increasing factors to extract can indicate and optimize robustness of results.
the cut-level would most likely have a larger influence on the TT produces clearly interpretable factors that account for al-
structure of the patterns. Instead of cutting the tree at a single most as much variation as factors from factor analysis, but the
height, another approach could be to start near the root of the sparse factors do not represent an overall dietary pattern.
tree and descending deeper into the tree, looking for optimal Besides, results on the relation between dietary patterns from TT
identification of patterns regarding number of food items with and incidence of diabetes are not in line with consistent findings
a non-zero loading, interpretability, and public health rele- from the literature. Results from this study indicate that factor
vance (28). analysis might be a more appropriate method for identifying
A major concern when applying TT to nutritional data is overall dietary patterns associated with diabetes compared with
whether it is in line with the original aim of dietary pattern TT.
analysis: to derive dietary patterns that represent the frequency
and amount of all foods consumed to capture overall diet (29). Acknowledgments
The combined role of all foods is essential in the biologic in- The authors thank Professor Graham Giles of the Cancer
fluence of diet on disease as well as for dietary interventions and Epidemiology Centre of The Cancer Council Victoria for per-
public health messages (30). Where factors from factor analysis mission to use the Dietary Questionnaire for Epidemiological
comprise all food items, the sparsity feature of TT results in Studies (version 2), Melbourne: The Cancer Council Victoria,
patterns ignoring foods with zero loading, as in the case of white 1996. D.A.J.M.S., G.D.M., and A.J.D. designed research;
bread and potatoes with fat. This may be due to the fact that D.A.J.M.S. analyzed data and had primary responsibility for
these food items were not correlated with the newly formed final content; G.D.M. and A.J.D. contributed to statistical
variables from the local PCA and hence were subsequently analysis and interpretation of results and critical revision of the
Comparing methods for dietary pattern analysis 397
manuscript; and S.S.S-M. contributed by critical revision of the 15. Ireland P, Jolley D, Giles G, OÕDea K, Powles J, Rutishauser I, Wahlqvist
manuscript for important intellectual content. All authors read ML, Williams J. Development of the Melbourne FFQ: a food frequency
questionnaire for use in an Australian prospective study involving an
and approved the final manuscript. ethnically diverse cohort. Asia Pac J Clin Nutr. 1994;3:19–31.
16. Hodge A, Patterson AJ, Brown WJ, Ireland P, Giles G. The Anti Cancer
Council of Victoria FFQ: relative validity of nutrient intakes compared
Literature Cited with weighed food records in young to middleaged women in a study of
1. Hu FB, Manson JE, Stampfer MJ, Colditz G, Liu S, Solomon CG, iron supplementation. Aust N Z J Public Health. 2000;24:576–83.
Willett WC. Diet, lifestyle, and the risk of type 2 diabetes mellitus in 17. Lewis J, Milligan G, Hunt A. NUTTAB95 Nutrient Data Table for use
women. N Engl J Med. 2001;345:790–7. in Australia. Canberra: Australian Government Publishing Service;
2. Esposito K, Kastorini CM, Panagiotakos DB, Giugliano D. Prevention 1995.
of type 2 diabetes by dietary patterns: a systematic review of prospective 18. National Health and Medical Research Council. Australian Alcohol
studies and meta-analysis. Metab Syndr Relat Disord. 2010;8:471–6. Guidelines: health risks and benefits. Canberra: ACT, Commonwealth
3. Kastorini CM, Panagiotakos DB. Dietary patterns and prevention of of Australia; 2001.
type 2 diabetes: from research to clinical practice; a systematic review. 19. Brown WJ, Burton NW, Marshall AL, Miller YD. Reliability and
Curr Diabetes Rev. 2009;5:221–7. validity of a modified self-administered version of the Active Australia
4. Salas-Salvadó J, Martinez-González M, Bulló M, Ros E. The role of diet physical activity survey in a sample of mid-age women. Aust N Z J
in the prevention of type 2 diabetes. Nutr Metab Cardiovasc Dis. Public Health. 2008;32:535–41.
2011;21:B32–48. 20. Brown WJ, Bauman AE. Comparison of estimates of population levels
5. Haines PS, Siega-Riz AM, Popkin BM. The Diet Quality Index revised: of physical activity using two measures. Aust N Z J Public Health.
a measurement instrument for populations. J Am Diet Assoc. 1999; 2000;24:520–5.
99:697–704. 21. Armstrong T, Bauman A, Davis J. Physical activity patterns of
6. Kennedy ET, Ohls J, Carlson S, Fleming K. The Healthy Eating Index: Australian adults. Canberra: Australian Institute of Health and Welfare;
design and applications. J Am Diet Assoc. 1995;95:1103–8. 2000.
7. Newby PK, Tucker KL. Empirically derived eating patterns using factor 22. WHO. Obesity: preventing and managing the global epidemic. Geneva:
WHO; 2000.

Downloaded from by guest on November 14, 2017

or cluster analysis: a review. Nutr Rev. 2004;62:177–203.
8. Michels KB, Schulze MB. Can dietary patterns help us detect diet and 23. Jolliffe IT. Rotation of principal components: choice of normalization
disease associations? Nutr Res Rev. 2005;18:241–8. constraints. J Appl Stat. 1995;22:29–35.
9. Martı́nez ME, Marshall JR, Sechrest L. Invited Commentary: factor 24. Hurley JR, Cattell RB. The procrustes program: producing direct
analysis and the search for objectivity. Am J Epidemiol. 1998;148:17–9. rotation to test a hypothesized factor structure. Behav Sci. 1962;7:258–
10. Lee AB, Nadler B, Wasserman L. Treelets: an adaptive multi-scale basis
for sparse unordered data. Ann Appl Stat. 2008;2:435–71. 25. Kline P. An easy guide to factor analysis. London: Routledge; 1994.
11. Gorst-Rasmussen A, Dahm CC, Dethlefsen C, Scheike T, Overvad K. 26. Liang K, Zeger SL. Longitudinal data analysis using generalised linear
Exploring dietary patterns by using the Treelet transform. Am J models. Biometrika. 1986;73:45–51.
Epidemiol. 2011;173:1097–104. 27. Gorst-Rasmussen A. tt. Stata add-on for performing treelet transfor-
12. Brown WJ, Bryson L, Byles JE, Dobson AJ, Lee C, Mishra G, Schofield mation; 2011 [cited 2012 Jul 1]. Available from: http://people.math.
M. Women’s Health Australia: recruitment for a national longitudinal
cohort study. Women Health. 1998;28:23–40. 28. Meinshausen N, Bühlmann P. Discussion of: Treelets: an adaptive multi-
13. Lee C, Dobson AJ, Brown WJ, Bryson L, Byles J, Warner-Smith P, scale basis for sparse unordered data. Ann Appl Stat. 2008;2:478–81.
Young AF. Cohort profile: the Australian Longitudinal Study on 29. Hu FB. Dietary pattern analysis: a new direction in nutritional
WomenÕs Health. Int J Epidemiol. 2005;34:987–91. epidemiology. Curr Opin Lipidol. 2002;13:3–9.
14. Young AF, Powers JR, Bell SL. Attrition in longitudinal studies: who do 30. Slattery ML. Analysis of dietary patterns in epidemiological research.
you lose? Aust N Z J Public Health. 2006;30:353–61. Appl Physiol Nutr Metab. 2010;35:207–10.

398 Schoenaker et al.

