Data Analysis in LifeCourseEpi - Article

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

American Journal of Epidemiology Vol. 190, No.

9
© The Author(s) 2021. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of https://doi.org/10.1093/aje/kwab087
Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Advance Access publication:
March 29, 2021

Practice of Epidemiology

Data-Driven Model Building for Life-Course Epidemiology

Anne H. Petersen∗, Merete Osler, and Claus T. Ekstrøm

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


∗ Correspondence to Anne H. Petersen, Section of Biostatistics, University of Copenhagen, Øster Farigmagsgade 5, 1014
Copenhagen K, Denmark (e-mail: ahpe@sund.ku.dk).

Initially submitted July 2, 2020; accepted for publication March 23, 2021.

Life-course epidemiology is useful for describing and analyzing complex etiological mechanisms for disease
development, but existing statistical methods are essentially confirmatory, because they rely on a priori model
specification. This limits the scope of causal inquiries that can be made, because these methods are suited
mostly to examine well-known hypotheses that do not question our established view of health, which could lead
to confirmation bias. We propose an exploratory alternative. Instead of specifying a life-course model prior to
data analysis, our method infers the life-course model directly from the data. Our proposed method extends the
well-known Peter-Clark (PC) algorithm (named after its authors) for causal discovery, and it facilitates including
temporal information for inferring a model from observational data. The extended algorithm is called temporal
PC. The obtained life-course model can afterward be perused for interesting causal hypotheses. Our method
complements classical confirmatory methods and guides researchers in expanding their models in new directions.
We showcase the method using a data set encompassing almost 3,000 Danish men followed from birth until age
65 years. Using this data set, we inferred life-course models for the role of socioeconomic and health-related
factors on development of depression.

causal discovery; life-course epidemiology; observational data; structure learning

Abbreviations: CPDAG, completed partially directed acyclic graph; DAG, directed acyclic graph; PC, Peter-Clark; TPC, temporal
Peter-Clark; TPDAG, temporal partially directed acyclic graph.

Life-course epidemiology facilitates modeling risk factors give little insight into life-course disease development (10).
as they develop and aggregate over the life course (1). In the latter case, the models are more consistent with the
Such a perspective is both useful and necessary in order life-course perspective, but model building largely relies on
to understand the etiology of complex chronic diseases elaborate theories of cause and effect (11). Such models need
such as cardiovascular disease (2), diabetes (3), and mental to describe both temporal and cross-sectional relationships
disorders (4–6). However, it is not obvious exactly how the among the variables, and thus require extensive prior knowl-
theoretical life-course framework should be operationalized edge.
into study designs facilitating empirical life-course analysis The reliance on a priori model specification has several
(7). shortcomings. First, it limits the scope of topics that can
Currently, empirical life-course studies either 1) rely on be studied using life-course analysis, given that a com-
traditional statistical exposure-outcome models (for exam- prehensive body of prior knowledge is needed. Second,
ple regression models) that only address a single chosen even when studying supposedly well-known phenomena,
outcome at a time, or 2) use joint models that try to describe the confirmatory nature of the methodology poses a risk of
development over the entire life course at once, for example reproducing existing biases and limits the ability to uncover
by use of structural equation models or path analysis (8, 9). new etiological mechanisms. To a large extent, traditional
In the former case, the life-course perspective is not really life-course analysis methods only facilitate quantification
utilized, except for interpreting the results, and the models of mechanisms that we already consider well-established.
are therefore essentially simplistic risk-factor analyses that An exception is approaches that perform model selection,

1898 Am J Epidemiol. 2021;190(9):1898–1907


Data-Driven Life-Course Epidemiology 1899

but existing methods only search through a rather small, A) B)


prespecified set of competing models (12) and therefore they
might not uncover new hypotheses.
In this article, we present an exploratory alternative to
classical confirmatory life-course analysis. Instead of speci-
fying a life-course model prior to data analysis, we propose
a method for inferring the life-course model directly from
the data. Our method is a temporal extension of the Peter- C)
Clark (PC) algorithm (13) for causal discovery (named after
its authors, Peter Spirtes and Clark Glymour) and we refer
to it as temporal PC (TPC). Although causal discovery
algorithms have been available for a long time, their use in

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


epidemiology is limited to only a few studies with small
numbers of included variables (see, for example, Rosen- Figure 1. An example of a directed acyclic graph (A), its corre-
ström et al. (14)). In the TPC method, we implement new and sponding skeleton (B), and the completed partially directed acyclic
previously suggested ideas for causal discovery with infor- graph that describes its Markov equivalence class (C).
mation about time while tailoring the method specifically for
life-course studies. Thereby, we hope to make causal discov-
ery available for the life-course epidemiologist’s statistical Figure 1 provides an example of a DAG, its underlying
toolbox. skeleton and its encompassing CPDAG.
We use the following additional notation:
• Z ∈ X denotes that the variable Z is contained in the data
TERMINOLOGY AND NOTATION
set X.
We assume familiarity with the concepts of a directed • Z ⊂ X denotes that the set of variables Z (possibly of size
acyclic graph (DAG), d-separation, confounding and selec- 1) is contained in the data set X.
tion variables (sometimes referred to as colliders), and the • X \ Y denotes the data set X without the variable(s) Y.
Markov property (also known as the causal Markov assump- •X⊥ ⊥ Y denotes that X and Y are independent.
tion). Hernán and Robins (15) provide an introduction to •X ⊥ ⊥ Y | Z denotes that X and Y are conditionally
these topics. independent given Z.
We will consider a data set consisting of p random vari-
ables X = X1 , . . . , Xp , and an observed counterpart of
CAUSAL DISCOVERY WITH THE PC ALGORITHM
n observations corresponding to realizations of those vari-
ables. We assume that the variables are numeric or binary. Before we present the TPC method for life-course discov-
We can represent the data by use of graphs where each ery, we will introduce the method that it extends, namely
variable is represented by a node in the graph, and we use the PC algorithm. The PC algorithm is a constraint-based
the terms variable and node interchangeably. When making algorithm for causal discovery, which means that it aims to
general statements not associated with a specific data set, we reconstruct the data-generating mechanism of a data set by
sometimes refer to arbitrarily chosen random variables A, B, examining what constraints on the probability distribution
and C to ease notation. this mechanism implies. The correctness of the PC algorithm
The skeleton of a directed graph is obtained by removing has been proven mathematically (13).
orientations from all edges, and it is thus an undirected We describe the algorithm in the oracle setting, where
graph. Two nodes are said to be adjacent if they are con- we assume that we have a complete list of all conditional
nected by an edge. A v-structure is a triplet of nodes A → independencies in X: For any variables, Xi , Xj ∈ X with
B ← C where A and C are nonadjacent (for example X3 → i = j and a (possibly empty) set Z ⊂ X \ Xi , Xj , we know
X5 ← X2 in Figure 1A). whether Xi ⊥ ⊥ Xj | Z is true. This is captured in assumption
A DAG is characterized by its Markov equivalence class, S1 below, and we discuss how we could estimate conditional
which consists of all DAGs that have the same conditional independencies empirically in the next section.
independencies (16). A Markov equivalence class can itself
be represented by a completed partially directed acyclic PC algorithm in the oracle setting
graph (CPDAG). A CPDAG has directed edges whenever all
DAGs in the Markov equivalence class agree on the orienta- We provide a conceptual outline of the PC algorithm (see
tion of the edge, and otherwise the edges are undirected. An also algorithm 2 in Appendix 1, and refer to Spirtes et al.
undirected edge in a CPDAG means that the orientation of (18) for a complete, technical description).
the edge cannot be determined from the probability distribu- The PC algorithm starts with a completely connected
tion of the variables. Thus, undirected edges should not be graph (step 1) and sequentially removes edges to recover the
interpreted as bidirected; the CPDAG describes a family of skeleton (step 2). It does so by utilizing the property that con-
DAGs and hence all edges are directed—we might just not ditional independencies imply absence of edges in a DAG:
know the directionality. A CPDAG is uniquely given by its If 2 variables are conditionally independent given any other
v-structures and its skeleton (17). set of variables, they cannot be adjacent in the graph. Hence,

Am J Epidemiol. 2021;190(9):1898–1907
1900 Petersen et al.

for each pair of nodes (Xi , Xj ), the algorithm searches for so- (untestable) assumptions. Assumption S1 relates to the
called separating sets S such that Xi ⊥ ⊥ Xj | S. If such an S probability distribution of the data, and therefore it is a
exists, the edge between Xi and Xj is removed. The algorithm statistical assumption. Assumptions C1–C3 are, on the other
terminates when the smallest possible separating set, or no hand, causal; they refer to the data-generating mechanism,
separating set, is found for each pair of variables (Xi , Xj ). which we cannot observe directly.
Afterward, a complete set of orientation rules can be applied
to obtain a CPDAG (step 3). These rules rely primarily on
TEMPORAL PC FOR LIFE-COURSE DISCOVERY
the fact that selection variables create special independence
structures that make it possible to recover some v-structures, We propose an extension of the PC algorithm that accom-
as well as the assumption of acyclicity. Web Appendix 1 modates life-course data where the same individuals are
and Web Figures 1–2 (available at https://doi.org/10.1093/ followed over time, and where variables have a known
aje/kwab087) provide an example that shows how the PC partial temporal ordering into time periods. As examples, we

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


algorithm infers the CPDAG in Figure 1C from information could divide the life course into just 3 periods (childhood,
about conditional independencies. adulthood, and old age) or into many periods each cover-
ing 1-year intervals. All variables in the data set must be
Assumptions assigned to exactly 1 of the periods. Variables within a period
can be constructed by aggregating over more finely observed
The PC algorithm produces the correct CPDAG under the measurements within that period, for example as a sum of
following 4 assumptions (19): hospitalization days during the period.
S1: Conditional independence information is available. Our proposal is a combination of previous ideas for incor-
This is necessary for the skeleton-building step from porating (temporal) background information in causal dis-
algorithm 2 in Appendix 1. In practice, the assumption covery, especially those of Spirtes et al. (18), and specific
will be asymptotically satisfied if we have an adequate suggestions for how these ideas can be used with the data
statistical testing procedure that can produce condi- structures typically available for life-course epidemiology.
tional independence information from empirical data.
C1: Causal sufficiency, which means that there are no
Temporal PC in the oracle setting
unobserved confounding or selection variables. This
assumption ensures that we are inferring relationships Algorithm 2 in Appendix 2, temporal PC, summarizes our
for the data-generating mechanism rather than just modification of the PC algorithm, and steps that have been
the variables in the observed data. Note that DAGs modified are labeled in bold. We describe the differences
are generally not valid descriptions of data-generating between the 2 algorithms in more detail below.
mechanisms without the assumption of causal suffi- We refer to the model resulting from the TPC algorithm as
ciency. Thus, causal sufficiency is a familiar, but per- a life-course model and the corresponding graph as a tempo-
haps often implicitly stated, assumption of empirical ral partially directed acyclic graph (TPDAG). A life-course
studies utilizing DAGs. model contains information about the temporal ordering of
C2: Faithfulness, which ensures that d-separations in the variables as well as the causal data-generating mechanism,
CPDAG imply causal unrelatedness (15). Heuristi- and it can be analyzed using the usual d-separation rules.
cally, it means that the data-generating mechanism can Note, however, that the graph is generally no longer a
be described solely by a DAG without an additional list CPDAG, because the temporal information allows us to rule
of distributional information. Faithfulness is routinely out certain graph instances in the Markov equivalence class.
applied in epidemiologic studies, for example, when It thus contains more information than a CPDAG.
a precisely estimated null effect is interpreted as no
causal effect—this would not be the case, if the data- Example. Consider again the DAG from Figure 1A, and
generating mechanism were not faithful to the graph, assume that the placement of the nodes reflects their obser-
as a null finding could then arise from non-null effects vation times such that X1 and X2 were measured in child-
canceling each other out perfectly (15). Faithfulness hood, X3 and X4 were measured in youth, and X5 and X6
is thus crucial for inferring causal relationships from were measured in old age. The TPC algorithm applied on
observational data. this data-generating mechanism will recover the TPDAG
C3: Acyclicity, which states that no cycles are allowed shown in Figure 2 (see Web Appendix 2 and Web Figure 3
in the data-generating mechanism. This assumption is for details). In this graph, we see that the childhood variable
needed in order to orient edges in the graph, because X2 is directly connected to the adulthood variable X5 but not
the orientation rules in step 3 utilize the property that indirectly so via directed paths. This would imply that the
each edge has a unique (but possibly indeterminable) childhood exposure X2 had a delayed, latent effect that could
orientation. Note that it is allowed to include multiple not be prevented by interventions during youth.
variables measuring the same construct at different
times, and that this is often a useful strategy for avoid- Temporal modifications. For the skeleton-construction
ing cyclic graphs. step, we have made the following modifications in the
TPC algorithm: Temporal information is accounted for by
Following the suggestion of Pearl (16), we want to stress restricting what variables are considered for separating sets,
the distinction between statistical (testable) and causal as suggested by Spirtes et al. (18): When searching for a

Am J Epidemiol. 2021;190(9):1898–1907
Data-Driven Life-Course Epidemiology 1901

defined as follows:

α · Zk , if Zk is binary
fk (Zk ) = S (Zk ), if Zk is numeric,

Figure 2. Temporal partially directed acyclic graph resulting from where s is a cubic spline. Due to lack of symmetry, we
using the temporal Peter-Clark (TPC) algorithm in a simulated data need to consider models in both directions. We then test the
example. The placement of nodes into columns represents their hypotheses
periods such that X1 and X2 were measured in childhood, X3 and
X4 in youth, and X5 and X6 in old age.
H0a : M0a = M1a and H0b : M0b = M1b

using likelihood ratio tests and conclude approximate con-

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


separating set S for 2 variables A and B, we do not allow S ditional independence whenever at least one of H0a and
to contain variables that occur strictly later than both A and H0b is not rejected. Without distributional assumptions, this
B—conditioning on the future is prohibited. procedure will test a necessary (but not sufficient) condition
In the edge-orientation step, temporal information is uti- for conditional independence. See Web Appendix 3 for more
lized in the TPC algorithm by 2 means. First, an extra step details.
(3.0) has been inserted where edges connecting nodes from
different periods are oriented according to the direction of Level: sparsity sequence. The tests described above rely
time. Second, step 3.1 has been modified such that we allow on the choice of a significance level. Instead of considering
potential v-structures to be oriented only if such an orienta- only a single significance level, we assess how the model
tion does not result in edges being directed against time. develops for a sequence of significance levels. The result will
then be a sequence of life-course models that describe the
data at different levels of sparsity.
Implementation suggestions for empirical setting More specifically, we apply the TPC algorithm multiple
times. For each application, we use a single significance
An implementation of the TPC algorithm must provide 2 level for all tests, and we refer to these significance levels
elements: as sparsity levels, ψ = {ψ1 , . . . , ψk }, ψi ∈ [0, 1]. It should
• Test: a procedure for testing approximate conditional be noted that the sparsity levels are not valid significance
independencies levels in the sense that they do not measure the risk of type I
• Level: a strategy for choosing the level for the tests errors, because the result of one test will have implications
for what other tests will be conducted.
We present suggestions for each below.
For simplicity, we assume that all variables are binary or
numeric, that there is no missing information in the data, and Example: TPC on simulated data
that the variables have a positive joint density with respect
to a product measure so they can vary independently of each Here we provide a small example showcasing practi-
other. cal use of TPC with the implementation suggestions listed
above. We apply the method to simulated data. We simu-
Test: regression-based information-loss test. We use regres- lated data according to the data-generating mechanism from
sion modeling as a heuristic test of conditional indepen- Figure 1, using a mix of numeric and binary variables along
dence. For each pair of variables (Xi , Xj ) and potential with curvilinear relationships between variables. Details
separating set Z = {Z1 , . . . , Zm } ∈ X \ {Xi , Xj }, we fit the 4 about the simulations are available in Web Appendix 4.
models: We applied the TPC algorithm with the implementation
suggestions outlined above. Replication code is provided

m
in Web Appendix 5. In this scenario, TPC should ideally
M0a : g(Xi ) = fk (Zk ) reproduce the TPDAG in Figure 2.
k=1 Figure 3 presents an overview of the results from 100
   m simulated data sets, and with TPC applied using 3 different
M1a : g(Xi ) = f0 Xj + fk (Zk ) sparsity levels, ψ ∈ {0.1, 0.01, 0.001}. For each sparsity
k=1 level, we provide a graph that summarizes the resulting
   TPDAGs. A directed edge is drawn if it is identified in
M0b : g̃ Xj = f̃k (Zk ) any of the 100 simulations, and the edge is annotated with
the percentage of simulations for which it was found. The
  
m percentages are placed near the end of the directed edge,
M1b : g̃ Xj = f̃0 (Xi ) + f̃k (Zk ), that is, closest to the arrowhead. Only a single edge was
k=1 identified with contradictory directions across simulations
(the edge between X5 and X6 for ψ = 0.1), and for this edge,
where g and g̃ are identity or logit link functions (for numeric there are 2 annotated percentages, one corresponding to each
and binary outcomes, respectively), and f and f̃ are both direction. The edge between X1 and X2 was always identified

Am J Epidemiol. 2021;190(9):1898–1907
1902 Petersen et al.

APPLICATION: DEVELOPMENT OF DEPRESSION IN


DANISH MEN

We showcase the TPC algorithm by investigating how


socioeconomic and health-related factors throughout the life
course are connected to development of depression in early
old age. All computations were performed in R using the
package causalDisco (20), and replication code is available
in Web Appendix 5.

Data

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


We use data (n = 2, 928) from the Metropolit Cohort (21),
encompassing Danish men born in 1953, followed from birth
until age 65 years. The data comprise information from
several contacts, including surveys at ages 12 and 51 years,
along with extensive administrative register data from the
Danish national registers.
We consider a total of 33 variables measured in 5 periods
over the life course: birth, childhood (approximately age 12
years), youth (ages 18–30 years), adulthood (approximately
age 51 years), and early old age (approximately age 65
years). We focus on development of clinical depression but
note that a life-course model does not necessarily have a
specific outcome of interest. Web Appendix 6, Web Fig-
ure 4, and Web Table 1 provide more details about the
data.

Results

In the interest of brevity, we present results only for


selected sparsity levels here, namely ψ ∈ {0.001, 0.00001,
0.0000001}, but TPDAGs for additional sparsity levels are
available in Web Figures 5–13. These specific values of
ψ are arbitrary and chosen to showcase how varying the
sparsity level affects the resulting TPDAG. Note that when
ψ becomes small, we impose a stricter threshold for tests,
which means that only the strongest relationships remain in
the graph.
Figure 3. Temporal partially directed acyclic graphs resulting from In Figure 4, we see that there are generally few edges
using the temporal Peter-Clark (TPC) algorithm on data simulated
between birth variables and variables from adulthood. We
from the data-generating mechanism in Figure 1. All edges that occur
at least once over the 100 simulations are included and the edges are
also see that depression in early old age is adjacent only
annotated with the percentage of times they were identified by the to depression in adulthood. We also see a large degree
TPC algorithm. The true TPDAG is marked in black, while spurious of interconnections for variables within the time periods,
edges are marked in gray. A) The results for ψ = 0.1; B) the results especially for childhood and adulthood variables.
for ψ = 0.01; C) the results for ψ = 0.001. In Figure 5, a more sparse graph has been produced,
because ψ has been decreased by a factor of 100, and we
see that especially edges from birth variables and adulthood
variables have been pruned, while most edges from child-
as undirected, and therefore, the percentage annotation is hood and youth remain.
placed by the middle of the edge. The true TPDAG is marked In Figure 6, we have again decreased ψ by a factor of
in black, while spurious edges are marked in gray. 100, and we now find that only few variables have edges
We find that for ψ = 0.1, several spurious edges are connecting them to nonadjacent time periods. The only
sometimes identified; for ψ = 0.01, a single spurious edge exceptions are height and intelligence score measured at
is identified, and for ψ = 0.001, only edges from the true youth, which have links to birth variables.
TPDAG occur. Generally, we see that TPC is successful in Table 1 provides an overview of the stability of the pro-
identifying the correct edges, but also that the graphs on cedure across all sparsity levels. We see that 1 new edge is
average become sparser as ψ decreases. Hence, for this small added when the sparsity level is changed from ψ = 10−5
example, the algorithm performs as expected. to ψ = 10−6 , but other than that, edges are removed when

Am J Epidemiol. 2021;190(9):1898–1907
Data-Driven Life-Course Epidemiology 1903

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


Figure 4. Temporal partially directed acyclic graph for the Metropolit data (Danish men born in 1953) for ψ = 0.001. Nodes are ordered in
columns according to time from left to right: birth (orange), childhood (purple), youth (green), adulthood (blue), and early old age (red). The
edges are colored according to the first time period that they refer to.

ψ decreases. The retention thus measures the percentage of retention rate is 100%, which means that no new edges are
edges that are not newly introduced in the TPDAG between 2 added when ψ is reduced by a factor 10. Thus, when ψ is
consecutive sparsity levels. We see that 96.88% of the edges reduced, not many new edges are introduced in the TPDAGs.
present in the TPDAG with sparsity 10−6 are retained from This implies that conclusions from small values of ψ are
the previous sparsity level. For all other pairs of graphs, the generally retained for larger values of ψ.

Figure 5. Temporal partially directed acyclic graph for the Metropolit data (Danish men born in 1953) for ψ = 0.00001. Nodes are ordered
in columns according to time from left to right: birth (orange), childhood (purple), youth (green), adulthood (blue), and early old age (red). The
edges are colored according to the first time period that they refer to.

Am J Epidemiol. 2021;190(9):1898–1907
1904 Petersen et al.

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


Figure 6. Temporal partially directed acyclic graph for the Metropolit data (Danish men born in 1953) for ψ = 0.0000001. Nodes are ordered
in columns according to time from left to right: birth (orange), childhood (purple), youth (green), adulthood (blue), and early old age (red). The
edges are colored according to the first time period that they refer to.

Interpreting the results Focusing on Figure 6, we apply d-separation to see that


depression in early old age is conditionally independent of
Under assumptions S1 and C1–C3, we may interpret the all remaining variables in the data set given information
TPDAGs as descriptions of the causal data-generating mech- about depression history in adulthood. There is thus no ben-
anism at different levels of imposed sparsity. efit in including any of the childhood information, or other
variables measured in adulthood, if the purpose is under-
standing why a person develops depression in early old age.
Table 1. Edge Development and Retention Rates for the Temporal Moreover, we find no causal effects of birth weight or
Partially Directed Acyclic Graphs From the Application to Metropolit birth length that span longer than youth. This is in contrast
Data, Denmark, for Men Born in 1953 to myriad epidemiologic studies linking these factors to
diabetes (22), death from ischemic heart disease (23), and
ψa dtotal b dnew c dremoved d Retentione , % mental health outcomes such as depression (24).
Note that the strong interpretations of the results discussed
10−2 61 here rely fully on the strong causal assumptions C1–C3.
10−3 47 0 14 100.00 Moreover, the hypotheses in the TPDAG are based on empir-
ical data and imperfect statistical estimation, and therefore,
10−4 39 0 8 100.00
they should not be trusted blindly. But they might serve
10−5 37 0 2 100.00 as inspiration for designing new studies, for instance, by
10−6 32 1 6 96.88 suggesting topics for mediation analysis (such as whether
10−7 27 0 5 100.00 birth weight effects on adulthood variables are mediated by
10−8 23 0 4 100.00 childhood or adolescence variables) or by suggesting what
10−9 22 0 1 100.00
variables need to be considered to make a specific causal
effect identifiable (25).
10−10 22 0 0 100.00
Note also that the TPDAG estimated here is retrospective
in the sense that it conditions on the individuals being alive at
Abbreviation: TPDAG, temporal partially directed acyclic graph.
a Sparsity level. age 65 years. This is not a consequence of the TPC algorithm
b The total number of edges in the TPDAG. as such but rather a property of the data set and the chosen
c The number of newly introduced edges in the TPDAG when testing procedure.
compared with the one in the row above (sparsity level factor 10
larger). DISCUSSION
d The number of edges removed from the TPDAG when compared

to the one in the row above. We have proposed a method extending the PC algorithm,
total − dnew )/dtotal .
e The retention rate is computed as (d
temporal PC, that produces life-course models from an

Am J Epidemiol. 2021;190(9):1898–1907
Data-Driven Life-Course Epidemiology 1905

observed data set. We have implemented TPC in the Moreover, in the oracle setting there is no difference
causalDisco R package (20), and we hope it will find prac- between the skeletons constructed by TPC and the original
tical use in life-course epidemiology. The TPC algorithm PC algorithm; the temporal independence constraints uti-
was used for generating new hypotheses in an applica- lized in the former will be available in the data already. How-
tion concerning development of depression. However, it ever, when conditional independencies are estimated from
requires strong causal assumptions in order to be interpret- observed data, the choice of skeleton-construction method
able. can make a difference, because the TPC algorithm places
A strength of the TPC algorithm is that it considers more trust in temporally induced conditional independencies
information from the whole life course jointly and allows for than conditional independencies inferred from data. We
exploratory model building. This facilitates building global consider this to be an attractive feature of the TPC algorithm.
models for the whole life course that can provide empirical This feature is not present in an alternative suggestion
evidence about presence or absence of causal links between for how to incorporate background information originally

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


exposures occurring early in life and disease onset much proposed by Meek (30) and recently further developed by
later (9). In particular, our method provides a framework Perković et al. (31) and implemented in the R package pcalg
for investigating whether hypotheses such as critical periods (32). In these works, first the skeleton (or possibly the full
of development (26) and the hygiene hypothesis (27) are CPDAG) is learned from data alone, and afterward the graph
supported by empirical evidence, and perhaps even disentan- is altered to comply with background information. While the
gle their respective effects, in contrast to existing methods results will be the same as for TPC in the oracle setting, these
(28). Note, however, that when data are used to infer a procedures do not put more trust in background information
model, the same data cannot be used to estimate effect sizes than statistical tests. When the background information is
for that model; the estimates would then be biased due to temporal, we believe that using post-hoc correction is there-
overfitting. This issue can easily be overcome by splitting the fore suboptimal.
data randomly and training the model on one subset, while Other related work includes using the PC algorithm for
estimating the effects on the other (29). discovering the structures of time series (33, 34). Note how-
We have based our life-course discovery methodology on ever that methods developed for time-series data require that
the PC algorithm, which is a constraint-based discovery al- the same variables are measured at all time points. This data
gorithm. Constraint-based methods utilize conditional inde- structure is not the typical case for life-course epidemiology,
pendencies in the data to infer missing causal links, while the and hence we do not consider these works readily applicable
competing approach within causal discovery, score-based within this field.
methods, uses heuristic search strategies to go through dif- Our suggested testing procedure allows only for numeric
ferent possible models and score each according to fit (for and binary variables. Extending the procedure to accommo-
example, using the Bayesian information criterion) (17). date other data types, including nominal categorical vari-
Both of these approaches could be useful for life-course ables and censored time-to-event variables, would be useful.
discovery. However, we do not aim to find a truly general test for
We provide no method for choosing the sparsity level. conditional independence that accommodates all variable
This reflects an ontological viewpoint on causal inference in types without parametric assumptions; unfortunately, it can
epidemiology: We believe that true data-generating mecha- be mathematically proven that such a test does not exist (35).
nisms for phenomena of epidemiologic interest are generally Therefore, if we are to test conditional independence in prac-
not sparse by nature; rather, they rely on complex causal tice, we will have to be pragmatic in one of 2 ways: Either we
mechanisms that consist of myriad minor causal effects need to adopt parametric assumptions that might not be fully
along with a few stronger ones (similar to De Stavola et al. satisfied (for example, linearity), or we will have to accept
(8)). Hence, we do not aim to learn the “correct” sparsity that we are testing necessary rather than sufficient conditions
level and the corresponding “correct” life-course model; for conditional independence (for example, no association).
rather, our method relies on the researcher choosing how We suggest that empirical studies using TPC conduct sen-
sparse a model is of interest, and then our method finds the sitivity analyses to assess the robustness of their results
strongest causal relationships available in the data at that with different choices of independence tests, for example,
level of sparsity. We consider the very large retention rates comparing our proposed regression-based information loss
in Table 1 to be an indication that we are actually achieving test with testing for vanishing partial correlations.
this ideal for sparsity control. An interesting avenue for future research is accommodat-
In the application, the time periods were given by the data- ing correlated data in causal discovery. TPC assumes inde-
collection times. In other settings, researchers will have to pendent observations, and hence further work will be needed
define the time periods themselves. In the oracle setting, to handle, for example, spatial data, sibling or twin designs,
exactly how the periods are defined should not affect the and other multilevel designs. To our knowledge, none of the
overall results: As long as the temporal order is retained, available methods for causal discovery accommodate such
changing the periods would correspond to removing or data. All these methods are local in the sense that they use
inserting intermediate variables on direct causal links in the heuristic searches to sequentially go through possible edges
model, and this does not affect the overall causal structure. in the graph. Therefore, we believe it will not be straight-
However, in empirical settings, the TPC algorithm might be forward to extend them to take into account, for example,
more sensitive toward how the periods are defined due to symmetry constraints (as imposed by sibling designs) or
finite sample properties (8). contamination effects (as imposed by spatial models).

Am J Epidemiol. 2021;190(9):1898–1907
1906 Petersen et al.

In the application, we noted that the life course was Health, University of Copenhagen, Copenhagen, Denmark
modeled retrospectively. This means that the resulting mod- (Merete Osler).
els should be thought of as descriptions of data-generating This work was funded by the Independent Research
mechanisms rather than tools for designing interventions Fund Denmark (grant 8020-00031B).
on, for example, children, because the variables are con- Data availability statement: The data used in the example
ditional on being alive at age 65 years. They are thus not (temporal PC on simulated data) can be reproduced using
representative of all children in the relevant background the replication code available in Web Appendix 5. The data
population. A natural first step to overcome this limitation used in the application (development of depression in
will be to incorporate right censoring and an absorbing state Danish men) are not available online for replication
(death) into the conditional-independence testing procedure, because they cannot be anonymized. Researchers interested
for example, by using methods for survival analysis with in gaining access to the data may contact the Public Health
competing risks (36). This would also make it possible to Database at the Department of Public Health, University of

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


include individuals that are available only for some of the Copenhagen.
time periods by constructing relevant risk sets, which would Conflict of interest: none declared.
be very useful because life-course studies often suffer from
rather large loss to follow-up (7, 8, 37, 38).
The most critical limitation for interpretability of the TPC
algorithm output is reliance on causal sufficiency. Causal REFERENCES
sufficiency is both untestable and often not very realistic in
practical applications within epidemiology, but it is crucial 1. Lynch J. Smith GD. A life course approach to chronic disease
because it is mathematically impossible to infer a CPDAG epidemiology. Annu Rev Public Health. 2005;26:1–35.
without this assumption. However, it is possible to infer 2. Barker DJ. Fetal origins of cardiovascular disease. Ann Med.
a more general class of models, namely maximal ances- 1999;31(suppl 1):3–6.
tral graphs, using, for example, the fast causal inference 3. Marshall SM. A life course perspective on diabetes:
developmental origins and beyond. Diabetologia. 2019;
(FCI) algorithm (18). However, maximal ancestral graphs 62(10):1737–1739.
are fundamentally different from DAGs; they are not a 4. Papachristou E, Frangou S, Reichenberg A. Expanding
simple generalization because they do not marginalize to conceptual frameworks: life course risk modelling for mental
DAGs. We believe that most epidemiologists are currently disorders. Psychiatry Res. 2013;206(2–3):140–145.
not sufficiently familiar with maximal ancestral graphs, and 5. Colman I, Ataullahjan A. Life course perspectives on the
hence, in order for discovery without causal sufficiency to epidemiology of depression. Can J Psychiatry. 2010;55(10):
become truly useful, this “new” graphical object will need 622–632.
to be introduced broadly in the field. Until this has been 6. Osler M, Rostrup E, Nordentoft M, et al. Influence of early
achieved, we believe that the TPC algorithm, combined life characteristics on psychiatric admissions and impact of
with skepticism about the validity of the causal sufficiency psychiatric disease on inflammatory biomarkers and survival:
a Danish cohort study. World Psychiatry. 2015;14(3):
assumption, is an acceptable first step. 364–365.
It might be possible to obtain meaningful, although non- 7. Kuh D, Ben-Shlomo Y, Lynch J, et al. Life course
causal, interpretations of the TPC result under less-restrict- epidemiology. J Epidemiol Community Health. 2003;57(10):
ive assumptions. The skeleton construction step of the TPC 778–783.
algorithm does not rely on assumptions C2 and C3, and 8. De Stavola BL, Nitsch D, dos Santos Silva I, et al. Statistical
even if only S1 is satisfied, it produces a correct skeleton. issues in life course epidemiology. Am J Epidemiol. 2006;
In the latter case, the skeleton will be a correct description 163(1):84–96.
of separating sets in the observed data, and a conservative 9. Gamborg M, Andersen PK, Baker JL, et al. Life course path
description of separating sets in the data-generating mech- analysis of birth weight, childhood growth and adult systolic
anism, in the sense that spurious conditional dependencies blood pressure. Am J Epidemiol. 2009;169(10):1167–1178.
10. Keyes KM, Galea S. The limits of risk factors revisited: is it
might be present (18). If C1 is also satisfied, the skeleton will time for a causal architecture approach? Epidemiology. 2017;
describe separating sets in the data-generating mechanism. 28(1):1–5.
However, the theory and interpretation of DAG skeletons 11. Baird J, Jacob C, Barker M, et al. Developmental origins of
is currently not very well-developed, so further research is health and disease: a lifecourse approach to the prevention of
needed for this to become useful. non-communicable diseases. Healthcare (Basel). 2017;
5(1):14.
12. Smith ADAC, Heron J, Mishra G, et al. Model selection of
the effect of binary exposures over the life course.
Epidemiology. 2015;26(5):719–726.
ACKNOWLEDGMENTS 13. Spirtes P, Glymour C. An algorithm for fast recovery of
sparse causal graphs. Social Science Computer Review. 1991;
Author affiliations: Section of Biostatistics, Department
9(1):62–72.
of Public Health, University of Copenhagen, Copenhagen, 14. Rosenström T, Jokela M, Puttonen S, et al. Pairwise measures
Denmark (Anne H. Petersen, Claus T. Ekstrøm); Center for of causal direction in the epidemiology of sleep problems and
Clinical Research and Prevention, Bispebjerg and depression. PLoS One. 2012;7(11):e50841.
Frederiksberg Hospitals, Copenhagen, Denmark (Merete 15. Hernán M, Robins J. Causal Inference: What If. Boca Raton,
Osler); and Section of Epidemiology, Department of Public FL: Chapman & Hall/CRC; 2020.

Am J Epidemiol. 2021;190(9):1898–1907
Data-Driven Life-Course Epidemiology 1907

16. Pearl J. Causality. New York, NY: Cambridge University 2020;48(3):1514–1538.


Press; 2009. 36. Andersen PK, Geskus RB, de Witte T, et al. Competing risks
17. Peters J, Janzing D, Schölkopf B. Elements of Causal in epidemiology: possibilities and pitfalls. Int J Epidemiol.
Inference: Foundations and Learning Algorithms. 2012;41(3):861–870.
Cambridge, MA: MIT press; 2017. 37. Banks J, Muriel A, Smith JP. Attrition and health in ageing
18. Spirtes P, Glymour CN, Scheines R, et al. Causation, studies: evidence from ELSA and HRS. Longit Life Course
Prediction, and Search. Cambridge, MA: MIT press; 2000. Stud. 2011;2(2).
19. Kalisch M, Bühlmann P. Estimating high-dimensional 38. Kelfve S, Fors S, Lennartsson C. Getting better all the time?
directed acyclic graphs with the PC-algorithm. J Mac Learn Selective attrition and compositional changes in longitudinal
Res. 2007;8:613–636. and life-course studies. Longit Life Course Stud. 2017;8(1):
20. Petersen AH. causalDisco: Tools for causal discovery [R 104–119.
package]. https://github.com/annennenne/causalDisco.
Accessed June 14, 2021.

Downloaded from https://academic.oup.com/aje/article/190/9/1898/6189737 by guest on 13 September 2023


21. Osler M, Lund R, Kriegbaum M, et al. Cohort profile: the APPENDIX 1
Metropolit 1953 Danish male birth cohort. Int J Epidemiol.
2006;35(3):541–545. Algorithm 1: PC (Peter-Clark) algorithm
22. Knop MR, Geng TT, Gorny AW, et al. Birth weight and risk
of type 2 diabetes mellitus, cardiovascular disease, and 1. Construct a fully connected graph over the variables X.
hypertension in adults: a meta-analysis of 7 646 267 2. Learn the graph skeleton:
participants from 135 studies. J Am Heart Assoc. 2018; For increasing d = 0, . . . , do:
7(23):e008870. For each pair of adjacent variables A and B do:
23. Leon DA, Lithell HO, Vågerö D, et al. Reduced fetal growth 2.1. Let XAB be variables in X \ {A, B} that are
rate and increased risk of death from ischaemic heart disease: connected to A.
cohort study of 15 000 Swedish men and women born 2.2. Search for a separating set S of size d in
1915–29. BMJ. 1998;317(7153):241–245. XAB such that A ⊥ ⊥ B | S. If such an S is
24. de Mola CL, de França GVA, de Avila Quevedo L, et al. Low found, remove the edge between A and B.
birth weight, preterm birth and small for gestational age
association with adult depression: systematic review and
3. Orient edges:
meta-analysis. Br J Psychiatry. 2014;205(5):340–347. 3.1 For each structure A − B − C, A − / C: orient as
25. Hernán MA, Robins JM. Estimating causal effects from A → B ← C if B ∈ / S for all S such that A ⊥⊥
epidemiological data. J Epidemiol Community Health. 2006; C | S.
60(7):578–586. 3.2 Recursively apply additional orientation rules to
26. Lucas A, Fewtrell MS, Cole TJ. Fetal origins of adult ensure that no cycles are introduced and no further
disease—the hypothesis revisited. BMJ. 1999;319(7204): v-structures are created.
245–249. 4. Output completed partially directed acyclic graph
27. Tognini P. Gut microbiota: a potential regulator of (CPDAG).
neurodevelopment. Front Cell Neurosci. 2017;11:25.
28. Hallqvist J, Lynch J, Bartley M, et al. Can we disentangle life
APPENDIX 2
course processes of accumulation, critical period and social
mobility? An analysis of disadvantaged socio-economic Algorithm 2: Temporal PC algorithm
positions and myocardial infarction in the Stockholm Heart
Epidemiology Program. Soc Sci Med. 2004;58(8):1555–1562. 1. Construct a fully connected graph over the variables X.
29. Hastie T, Tibshirani R, Friedman J. Model assessment and 2. Learn the skeleton:
selection. In: Hastie T, Tibshirani R, Friedman J. The For increasing d = 0, . . . , do:
Elements of Statistical Learning. New York, NY: Springer; For each pair of adjacent variables A and B, do:
2009:219–259. 2.1. Let X̃AB be variables in X \ {A, B} that are
30. Meek C. Causal explanation with background knowledge. In:
connected to A and that occur no later than
Proceedings of the Eleventh Conference on Uncertainty in
Artificial Intelligence. Montreal, Canada: Morgan Kaufmann; the latest variable among A and B.
1995:403–410. 2.2 Search for a separating set S of size d in X̃AB
31. Perković E, Kalisch M, Maathuis MH. Interpreting and using such that A ⊥ ⊥ B | S. If such an S is found,
CPDAGs with background knowledge. In: Proceedings of the remove the edge between A and B.
2017 Conference on Uncertainty in Artificial Intelligence. 3. Orient edges:
Sydney, Australia: AUAI Press; 2017. 3.0. Direct any edges that move across 2 time periods:
32. Kalisch M, Mächler M, Colombo D, et al. Causal inference For any adjacent pair (A, B) with A occurring
using graphical models with the R package pcalg. J Stat
strictly before B, direct the edge: A → B.
Softw. 2012;47(i11):1–26.
33. Moneta A, Spirtes P. Graphical models for the identification 3.1. For each structure A − B − C, A − / C: orient as
of causal structures in multivariate time series models. In: A → B ← C if B ∈ / S for all S such that A ⊥
⊥C|S
Proceedings of the 9th Joint International Conference on and neither A ← B nor B → C after step 2.1.
Information Sciences. Dordrecht, the Netherlands: Atlantic 3.2. Recursively apply additional rules from the Peter-
Press; 2006;613–616. Clark (PC) algorithm (no cycles, no further v-
34. Chu T, Glymour C. Search for additive nonlinear time series structures).
causal models. J Mac Learn Res. 2008;9:967–991. 4. Output temporal partially directed acyclic graph
35. Shah RD, Peters J. The hardness of conditional independence (TPDAG).
testing and the generalised covariance measure. Ann Statist.

Am J Epidemiol. 2021;190(9):1898–1907

You might also like