Professional Documents
Culture Documents
Using Bayesian Networks To Analyze Expression Data
Using Bayesian Networks To Analyze Expression Data
Using Bayesian Networks To Analyze Expression Data
127
We are therefore able to focus on interactions whose signal m the To specify a joint dJstnbution, we also need to specify the con-
data ~s strong. ditional probabilmes that appear in the product form. This is the
Bayesian networks are a promising tool for analyzing gene ex- second component of the network representation. This component
pression patterns. First, they are particularly useful for descnbmg describes distributions P(x,[pa(X,)) for each possible value x,
processes composed of locally interacting components; that is, the of X,, and p a ( X , ) of Pa(X,). In the case of finite valued van-
value of each component directly depends on the values of a rela- ables, we can represent these conditional distributions as tables.
twely small number of components. Second, statisacai foundations Generally, Bayesian networks are flexible and can accommodate
for learning Bayesmn networks from observations, and computa- many forms of conditional distnbution, including various continu-
tional algonthms to do so are well understood and have been used ous models.
successfully m many apphcauons. Finally, Bayesian networks pro- Given a Bayesian network, we might want to answer many
vide models of causal influence Although Bayesian networks are types of questions that involve the joint probability (e g , what is
mathematically defined stnctly m terms of probabdmes and con- the probablhty of X -- :v given observation of some of the other
ditlonai independence statements, a connectmn can be made be- variables 9) or independencies m the domain (e.g., are X and Y
tween this charactenzatmn and the notion of direct causal influ- independent once we observe Z?). The hterature contains a state
ence [22, 32, 36] of algonthms that can answer such queries (e.g., see [25, 31]), ex-
The remainder of this paper is orgamzed as follows. In Sec- ploiting the explicit representatmn of structure in order to answer
tmn 2, we review key concepts of Bayesmn networks, learning queries efficiently
them from observations, and using them to infer causality. In Sec-
tion 3, we descnbe how Bayesian networks can be applied to model 2.2 Equivalence Classes of Bayesian Networks
interactions among genes and discuss the technical issues that are
posed by tMs type of data. In Section 4, we apply our approach A Bayesian network structure G implies a set of independence as-
to the gene-expression data of Spellman et al. [35], analyzing the sumpuons in addmon to (*). Let Ind(G) be the set of independence
statistical significance of the results and their biological plausibil- statements (of the form X is independent of Y given Z ) that hold
ity. Finally, m Sectmn 5, we conclude with a discussion of related in all distributions satisfying these Markov assumptions These can
approaches and future work be denved as consequences of (*).
More than one graph can imply exactly the same set of inde-
pendencies. For example, consider graphs over two vanables X
2 Bayesian Networks
and Y. The graphs X -+ Y and X 4--- Y both imply the same set
2.1 Representing Distributions with Bayesian Networks of independencies (i.e., lnd(G) = 0). We say that two graphs G
and G' are equivalent if Ind(G) = Ind(G').
Consider a finite set ,l" = { X 1 , . . , X n } of random variables This notion of equivalence is cmoal, since when we examine
where each variable X, may take on a value x, from the domain observations from a distribution, we cannot distinguish between
VaI(X,) In this paper, we focus on finite domains, though much of equivalent graphs. Results of [7, 32] show that we can characterize
the following holds for infinite domains, such as continuous valued equwalence classes of graphs using a simple representation. In par-
random variables. We use capital letters, such as X , Y, Z, for vari- ticular, these results establish that equivalent graphs have the same
able names and lowercase letters x, y, z to denote specific values underlying undirected graph but might disagree on the direction of
taken by those vanables. Sets of vanables are denoted by boldface some of the arcs.
capital letters X , Y , Z , and assignments of values to the variables
in these sets are denoted by boldface lowercase letters x, y, z. We Theorem 2.1 [32] Two graphs are equivalent if and only if their
denote I ( X ; Y [ Z ) to mean X is independent of Y conditioned DAGs have the same underlymg undtrected graph and the same v-
on Z . structures (i.e. converging directed edges mto the same node, such
A Bayesian network is a representation of a joint probability as a --4 b +-- c ).
distnbution This representation consists of two components. The Moreover, an equivalence class of network structures can be
first component, G, is a directed acychc graph whose vertices cor- uniquely represented by a partially directed graph (PDAG), where
respond to the random variables X i , . . . , Xn. The second com- a directed edge X --+ Y denotes that all members of the equiv-
ponent descnbes a conditional distribution for each vanable, given alence class contain the arc X --+ Y; an undirected edge X - - Y
its parents m G. Together, these two components specify a unique denotes that some members of the class contain the arc X --+ Y,
distnbution on X 1 , . . . , Xn. while others contain the arc Y --+ X . Given a directed graph G,
The graph G represents conditional independence assumptions the PDAG representation of its equivalence class can be constructed
that allow the joint distnbution to be decomposed, economizing efficiently [7].
on the number of parameters. The graph G encodes the Markov
Assumption:
2.3 Learning Bayesian Networks
(*) Each variable X, is independent of its non-descendants, given
~ts parents m G. The problem of learning a Bayesian network can be stated as fol-
lows. Given a trainmg set D = {x 1, . . . , x N} of independent in-
By applying the chain role of probabflmes and properties of stances of X, find a network B = (G, O) that best matches D.
condmonal independencies, any joint distnbuUon that satisfies (*) (More precisely, we search for an equivalence class of networks
can be decomposed m the product form that best matches D.) The common approach to this problem is to
introduce a statistically motivated sconng function that evaluates
n
each network with respect to the training data, and to search for the
P ( X 1 , . . . , X, 0 = 1"I P ( X , IPa(X,)), optimal network according to this score.
t= l A commonly used sconng function is the Bayesian score (see [ 10,
21] for complete descnption):
where P a ( X , ) is the set of parents of X, in G. Figure 1 shows an
example of a graph G, lists the Markov independencies it encodes, S(G:D) = logP(GlD)
and the product form they Imply. = log P ( D I G) +log P ( G ) + C
128
Figure 1: An example of a s~mple Bayesmn network stmctat¢.
This network structure implies several conditional independence statements:
I(A; E), I(B, D I A, E), I(C, A, D, E I B), I(O; B, C, E I A), and I(E; A, D)
The network structure also imphes that t h e j o m t distribution has the product form
where C is a constant independent of G and for each node), the difference being it interprets the parents of a
variable as its immediate causes.
We can relate causal networks and Bayesian networks, by as-
P(D I C) = / P(D I c, O)P(O I G)dO suming the Causal MarkovAssumption: given the values of a vari-
able's immediate causes, it is independent of its earlier causes.
Is the marginal hkehhood which averages the probabdlty of the When the casual Markov assumption holds, the causal network sat-
data over all possible parameter assignments to G. The particular ~sfies the Markov independencies of the corresponding Bayesmn
choice of priors P(G) and P ( O [ G) for each G determines the network, thus allowing us to treat causal networks as Bayesmn net-
exact Bayesian score. Under mild assumptmns on the pnor prob- works For example, this assumpuon is a natural one in models of
abilities, this sconng metric is asymptotically consistent: Gwen a genetic pedigrees, once we know the genetm makeup of the indi-
sufficiently large number of samples, graph structures that exactly vidual's parents the genetic makeup of her ancestors are not infor-
capture all dependencies m the distribution, will mcewe, with h~gh mative about her own genetic makeup.
probability, a higher score than all other graphs [19] This means, The main difference between causal and Bayesmn networks, is
that given a sufficiently large number of instances in large data sets, that a causal network models not only the d~stnbutmn of the ob-
learning procedures can pinpoint the exact network structure up to servatmns, but also the effects of interventions. If X causes Y,
the correct eqmvalence class. then mampulatmg the value of X 0.e., setting it to another value
Heckerman et al. [21] present a family of priors, called BDe in such a way that the manipulation itself does not affect the other
priors, that satisfy two ~mportant requirements: First, these priors variables), affects the value of Y. On the other hand, if Y IS a cause
are structure equivalent, i.e., if G and G ' are equivalent structures of X , then manipulating X will not affect Y. Thus, although the
they are guaranteed to have the same score. Second, the priors are Bayesian networks X --+ Y and X 6-- Y are equivalent, as causal
decomposable. That is, the score can be rewritten as the sum networks they are not.
When can we learn a causal network from observations? This
SBDe(G : D ) = ~ ScoreContributlonnDe(X,, P a ( X , ) : D ) , issue received a thorough treatment m the hterature [22, 32, 36]
From observauons alone, we cannot distinguish between causal
networks that specify the same independence assumptions, l.e, be-
where the contnbutmn of every variable X , to the total network long to the same eqmvalence class. When learning an eqmvalence
score depends only on its own value and the values of its parents class (PDAG) from the data, we can conclude that the true causal
in G. These two properties are sausfied for BDe pnors when all network is possibly any one of the networks in this class If a di-
instances x t in D are complete--that IS, they assign values to all rected arc X --+ Y is in the PDAG, then all the networks in the
the vanables in ,1=" eqmvalence class agree that X is an immediate cause of Y. Thus,
Once the prior is specified (we use an un-mformauve prior in we infer the causal direction of the interactmn between X and Y
our experiments) and the data is given, learning amounts to finding We stress that we can infer such causal relations without any exper-
the structure G that maximizes the score. This problem Is known to imental lnterventmn (e.g. knockout and over-expressions) among
be NP-hard [8], thus we resort to heuristic search. The decompo- our samples.
stuon of the score ~s crucial for this optlmlzatmn problem. A local
search procedure that changes one arc at each move can efficiently 3 Applying Bayes|an Networks to Expression Data
evaluate the gains made by adding, removing or reversing a single
arc An example of such a procedure is a greedy hdl-chmblng algo- in this section we describe our approach to analyzing gene expres-
rithm that at each step performs the local change that results in the sion data using Bayesian network learning techniques. We model
maximal gain, until it reaches a local maximum. Although this pro- the expression level of each gene as a random vanable. In addition,
cedure does not necessanly find a global maximum, it does perform other attributes that affect the system can be modeled as random
well in practice. Examples of other search methods that advance us- vanables. These can include a variety of attnbutes of the sam-
ing one-arc changes include beam-search, stochastic hill-climbing, ple, such as experimental conditions, temporal indicators (i.e., the
and simulated annealing. time/stage that the sample was taken from), background variables
(e g., which clinical procedure was used to get a biopsy sample),
2.4 Learning Causal Patterns and exogenous cellular conditions.
By learning a Bayesian network based on the statistical depen-
A Bayesian network is a model of dependencies between multiple dencies between these vanables, we can answer a wide range of
measurements. We are also interested in modeling the process that quenes about the system. For example, does the expression level
generated these dependencms Thus, we need to model the flow of a particular gene depend on the expenmental condition? Is this
of causahty in the system of interest (e.g., gene transcription). A dependence direct, or indirect? If it IS indirect, which genes medi-
causal network is a model of such causal processes A causal net- ate the dependency 9 We now describe how one can learn such a
work is similar to a Bayesmn network (1 e , a DAG where each node model from the gene expression data. Many important issues arise
represents a random variable along with a local probablhty model when learning in this domain. These revolve statistical aspects of
129
interpreting the results, algorithmic complexity ~ssues m learning the bootstrap is simple. We generate "perturbed" versions of our
from the data, and preprocessmg the data. original data set, and learn from them. In this way we collect many
Most of the difficulties m learning from expression data revolve networks, all of which are fairly reasonable models of the data
around the fotlowmg central point: Contrary to previous apphca- These networks show how small perturbations to the data can ef-
t~ons of learning Bayesian networks, expression data mvolves tran- fect many of the features.
script levels of thousands of genes while current data sets contam In our context, we use the bootstrap as follows:
at most a few dozen samples. This raises problems m computa-
tional complexity and the statisucal stgmficance of the resulting • For z = 1. m (m our experiments, we set m = 200).
networks. On the posmve s~de, genetic regulation networks are
sparse, Le, g~ven a gene, ~t is assumed that no more than a few - Re-sample with replacement, N instances from D. De-
note by D, the resulting dataset.
dozen genes d~rectly affect ~ts transcription. Bayesmn networks are
especmlly stated for learning in such sparse domains. - Apply the learning procedure on D, to reduce a net-
work structure (~,
3.1 Representing Partial Models • For each feature f of interest calculate
When leammg models with many variables, small data sets are m
not sufficiently mformatwe to sJgmficantly determine that a single
model is the "right" one. Instead, many different networks should
conf(f) = 1 ~ f(c.)
be considered as reasonable explanation of the given the data. From
a Bayesian perspecttve, we say that the posterior probability over where f(G) is 1 if f is a feature m G, and 0 otherwise
models ]s not dominated by a single model (or eqmvalence class
of models)) Our approach Is to analyze this set of plausible (i.e., We refer the reader to [16] for more details, as well as large-scale
high-scoring) networks. Although this set can be very large, we simulation experiments with this method. These simulauon exper-
m~ght attempt to characterize features that are common to most of Iments show that features induced with high confidence are rarely
these networks, and focus on learning them. false positives, even m cases where the data sets are small com-
Before we examine the ~ssue of referring such features, we pared to the system being learned. This bootstrap procedure ap-
briefly discuss two classes of features mvolwng pmrs of variables. pears especmlly robust for the Markov and order features described
While at th~s point we handle only pmrwise features, ~t ~s clear that m section 3.1.
th~s analys~s ~s not restricted to them, and in the future we are plan-
ning on exammmg more complex features. 3.3 Efficient Learning Algorithms
The first type of features ~s Markov relations: Is Y in the Markov
blanket of X ? The Markov blanket of X is the minimal set of vari- In secUon 2.3, we formulated learning Bayesian network structure
ables that shield X from the rest of the variables in the model. as an optimization problem in the space of directed acychc graphs.
More precisely, A- g~ven its Markov blanket ~s independent from The number of such graphs is super-exponential m the number of
the remaining variables m the network. It ~s easy to check that th~s variables. As we consider hundreds & thousands of variables, we
relation is symmetric: Y ~s m X ' s Markov blanket ff and only ff must deal with an extremely large search space. Therefore, we need
there ~s e~ther an edge between them, or both are parents of another to use (and develop) efficient search algorithms.
variable [31 ]. In the context of gene expressmn analys~s, a Markov To facihtate efficient learning, we need to be able to focus the
relaUon indicates that the two genes are related m some jomt b~o- attention of the search procedure on relevant regions of the search
logical interaction or process. Note, two variables in a Markov re- space, giving rise to the Sparse Candidate algorithm [18]. The
lation are d~rectly hnked m the sense that no variable m the model main ]dea of this techmque is that we can idenUfy a relatively small
medmtes the dependence between them. It remains possible that an number of can&date parents for each gene based on simple local
unobserved vanable (e g., protein activation) ~s an intermedmte in statlsUcs (such as correlation). We then restrict our search to net-
their interaction. works m which only the candidate parents of a variable can be its
The second type of features is orderrelattons. Is X an ancestor parents, resulting m a much smaller search space in which we can
of Y m all the networks of a gwen equwalence class? That ~s, hope to find a good structure qmckly.
does the gwen PDAG contain a path from X to Y m which all the A possible pitfall of this approach ]s that early choices can re-
edges are directed? Th~s type of feature does not revolve only a sult m an overly restricted search space. To avoid this problem, we
close neighborhood, but rather captures a global property. Recall devised an iteratwe algorithm that adapts the canthdate sets d u n n g
that under the assumptions of Section 2.4, leammg that X is an search. At each iteration n, for each variable X , , the algorithms
ancestor of Y would ~mply that X ~s a cause of Y. However, these chooses the set (7~ = {Y1,.. •, Yk } of variables which are the
assumpuons do not necessarily hold m the context of expression most promising can&date parents for X~. We then search for B,~,
data. Thus, we v~ew such a relation as an indication, rather than an opumal network m which P a c~ ( X , ) C_ C~. The network found
evidence, that X might be a causal ancestor of Y. ]s then used to grade the selectmn of better candidate sets for the
next iteration. We ensure that B,~ monotomcally improves m each
3.2 Estimating Statistical Confidence in Features iteration by requinng P a c'~-1 ( X , ) C_ C~. The algorithm contin-
ues until there Is no change m the candidate sets.
We now face the following problem. To what extent do the data We briefly outhne our method for choosing C~. We assign
support a given feature? More precisely, we want to estimate a mea- each X 3 some score of relevance to X , , choosing variables with
sure of confidence m the features of the learned networks, where the highest score. A natural score that measures the dependence be-
"confidence" approximates the hkehhood that a gwen feature lS ac- tween two variables is their mutual mformatlon denoted I(X; Y).
tually true (i.e. is based on a genuine correlation and causat]on). The following is an example that arises with such a score: Consider
An effective, and relatwely s]mple, approach for estimatmg the network m Figure 1 If I(B; D) > I(B; E), for k = 2, E wdl
confidence is the bootstrap method [14]. The mare ~dea behmd be left out of C ~ . Since A mediates the dependence between B
and D, the network learned m this iteration wdl contain only A as
1This observation ]s not umque to Bayesmn network models It equally well apphes
to other models that are learned from gene expression data, such as clustenng models
B ' s parent. We can use th~s conditional mdependence to improve
130
4 Application to Cell Cycle Expression Patterns
131
Order Markov
Jk
Ongmal set Random set Ongmal set Random set
ao~0
~2c~ roo~l
I e0c~l
Fr.o0 2500I
"~i
5001
L
o2 O~ oe o~ 0'2 014 O~ 08 Ot 02 0~3 O~i OS 0* 07 Oi
50
¢S
40
35
ao
2S
2O
Im
10
s
o
03 04 OS O~ 07
._J
08 Ot *
10
o~illlol n ~..., o#
.....
07 ol ol
Figure 3: Histograms of confidence levels for the cell cycle data set, and the randomized data set. The histograms on the left are of order
relations, and the ones on the nght are of Markov relations. The histograms on the top row show the distribution of confidence levels in the
interval [0, 1] The histograms on the bottom row show the tails of these dlstnbuuons for high-confidence features. These histograms are all
based on the 250 genes data set.
the high confidence features to the choice of this threshold. This Order relations Markov relations
1.
was tested by repeating the experiments using different threshold
levels. Again, the graphs show a definite linear tendency in the
confidence estimates of features between the different discretiza-
tlon thresholds. Obviously, this linear correlation gets weaker for o, l
oe~-
larger threshold differences. We also note that order relations are
much more robust to changes in the threshold than the Markov re-
lations E "
3 A n u n d e s i r e d effect o f s u c h a n o r m a l i z a t i o n is the a m p l i f i c a t i o n o f m e a s u r e m e n t
We beheve that the results of this analysis can be mdlcatwe of bi-
n o i s e I f a g e n e h a s fixed e x p r e s s i o n l e v e l s a c r o s s s a m p l e s , w e e x p e c t the v a r i a n c e ological phenomena in the data. This is confirmed by our abdzty
in m e a s u r e d e x p r e s s i o n l e v e l s to b e n o i s e e i t h e r i n the e x p e r i m e n t a l c o n d i t i o n s o r the to predict sensible relations between genes of known function. We
measurements W h e n w e n o r m a l i z e the e x p r e s s i o n l e v e l s o f g e n e s , w e l o o s e the now examine several consequences that we have learned from the
distinction betweensuch noise and true (1e, significant) changesin expression levels
In our experiments,we can safely assume this effect will not be too grave, since we data. We consider, m turn, the order relations and Markov relations
only focus on genesthat displaysignificantchangesacross experiments found by our analysis.
132
Table 1: L~st of dominant genes in the ordenng relations (top 14 out of 30)
4.2.1 Order Relations unknown pairs are physically adjacent on the chromosome, and
thus presumably regulated by the same mechanism (see [5]), al-
The most sinking feature of the high confidence order relations, though s p e o a l care should be taken for pairs whose chromosomal
is the existence of dominant genes. Out of all 800 genes only location overlap on complementary strands, since in these cases we
few seem to dominate the order 0.e., appear before many genes). might see an amfact resulting from cross-hybndlzatlon. Such anal-
The mtumon is that these genes are ind~catwe of potentml causal ysis raises the number of biologically sensible pairs to 19/20.
sources of the cell-cycle process. Let Co(X, Y) denote the confi- There are some interesting Markov relations found that are be-
dence m X being ancestor of Y We define the dommance score yond the hmltatlons of clustenng techniques. One such regulatory
of X as ~-~r, Co(X,r)>, Co(X, y)k, usmg the constant k for re- hnk is F A R I - A S H I : both protems are known to p a m o p a t e in a
warding h]gh confidence features and the threshold t to discard low matmg type sw~tch. The correlation of their expression patterns
confidence ones. These dominant genes are extremely robust to is low and [35] cluster them into different clusters. Among the
parameter selection for both t, k and the thscret~zatlon cutoff of high confidence markov relations, one can also find examples of
section 3 4 A list of the Mghest scoring dominating genes appears condmonal indpendence, i.e., a group of highly correlated genes
m table 1 whose correlaUon can be explained within our network stucture.
Inspecuon of the hst of dommant genes reveals qmte a few In- One such example revolves the genes: CLN2,RNR3,SVS 1,SRO4
teresting features. Among the dominant genes are those directly and RAD41, their expression is correlated, in [35] all appear in the
revolved m cell-cycle control and mmauon. For example, C L N I , same cluster. In our network CLN2 is with h~gh confidence a par-
CLN2 and CDCS, whose functional relation has been estabhshed [11, ent of each of the other 4 genes, while no hnks are found between
13]. Other genes, hke MCD1 and RFA2, were found to be es- them. TMs suits biological knowledge: CLN2 is a central and early
sentml [20]. These are clearly key genes in bas]c cell functions, cell cycle control, while there is no clear biologmal relationsMp
revolved m chromosome dynam]cs and stabihty (MCD1) and m between the others.
nucleotlde exoslon repmr (RFA2). Most of the dominant genes en-
code nuclear protems, and some of the unknown genes are also po-
tentially nuclear: (e.g, YLR183C contmns a forkhead-associated 5 Discuslon and Future Work
domain which is found almost enarely among nuclear protems).
In tMs paper we presented a new approach for analyzing gene ex-
Some of them are components of pre-rephcat]on complexes. Oth-
pression data that builds on the theory and algorithms for learning
ers (hke RFA2,POL30 and MSH6) are involved in DNA repmr. It
Bayesian networks. We described how to apply these techniques to
is known that DNA repmr Is a prerequisite for transcnpaon, and
DNA areas which are more active m transcnpt]on, are also repaired gene expression data. The approach builds on two techmques that
more frequently [28, 37] were motivated by the challanges posed by this domain: a novel
A few non nuclear dominant genes are locahzed m the cyto- search algorithm [18] and an approach for estimating statistical
confidence [16]. We apphed our methods to real expression data
plasm membrane (SRO4 and RSR1) These are involved m the
buddmg and sporulat]on process which have an important role m of Speliman et al. [35]. Although, we did not use any pnor knowl-
the cell-cycle. RSR1 belongs to the ras family of proteins, which edge, we managed to extract many b]ologically plausible conclu-
sions from this analysis
are known as initiators of s~gnal transductlon cascades in the cell.
Our approach ts quite different than the clustenng approach
used by [2, 4, 15, 29, 35], m that it attempts to learn a much ncher
4.2.2 Markov Relations structure from the data. Our methods are capable of thscovenng
Inspection of the top Markov relations reveals that most are func- causal relationships, mteracaons between genes other than posi-
t~onaly related A hst of the top sconng relations can be found m tive correlation, and finer mtra-cluster structure. We are currently
table 2. Among these, all mvolwng two known genes make sense developing hybrid approaches that combine our methods with clus-
biologically. When one of the ORFs is unknown careful searches tenng algorithms to learn models over "clustered" genes
using Psi-Blast [3], Pfam [34] and Protomap [40] can reveal firm The biological motwation of our approach is similar to work
homologies to protems functionally related to the other gene in the on inducing geneuc networks from data [1, 6, 33, 38] There are
pair (e.g. YHRI43W, which ~s pmred to the endochitmase CTSI, two key differences: First, the models we learn have probabhstlc
]s related to EGT2 - a cell wall mamtence protem) Several of the semantics This better fits the stochastic nature of both the bio-
logical processes and noisy experimentation. Second, our focus ~s
133
Table 2: Ltst of top Markov relauons
on extracting features that are pronounced tn the data, in contrast nual ACM-SIAM Sympostum on Discrete Algorithms. ACM-
to current genetm network approaches that attempt to find a single SLAM, 1998.
model that explams the data. [2] U. Alon, N. Barkai, D.A. Notterman, K. Gssh, S. Ybarra,
We are currently workmg on improving methods for expres- D. Mack, and A. J. Levme. Broad patterns of gene expression
sion analys~s by expanding the framework described m tfus work. revealed by clustenng analysis of tumor and normal colon tss-
Promising d~rect~ons for such extentions are: (a) Developing the sues probed by oligonucleoude arrays Proc. Nat. Acad. Sct.
theory for leammg local probabdtty models that are capable of USA, 96:6745-6750, 1999.
deahng wtth the continuous nature of the data; (b) lmprovmg the
theory and algorithms for esumatmg confidence levels, (c) Incor- [3] S. Altschul, L. Thomas, A. Schaffer, Z. Zhang, J. Zhang,
poratmg btoiogical knowledge (such as posssble regulatory regtons) W. Miller, and D. Lipman. Gapped blast and psi-blast: a
as prior knowledge to the analysts; (d) Improving our search heuris- new generation of protein database search programs. Nucletc
tins; (e) Applymg Dynamic Bayestan Networks ([17]) to temporal Acids Res, 25, 1997
expression data. [4] A. Ben-Dor, R. Shamir, and Z YakhinL Clustenng gene ex-
Finally, one of the most excmng longer term prospects of tMs pression patterns. Journal of Computatwnal Bzology, 6:281-
ime of research ts dtscovering causal patterns from gene expres- 297, 1999.
ston data We plan to build on and extend the theory for learning [5] T. Blumenthal. Gene clusters and polyclstromc transcription
causal relattons from data and apply st to gene expresston. The m eukaryotes. Bioessays, pp. 480-487, 1998.
theory of causal networks allows learning both from observattonal
data and mterventwnal data, where the experiment intervenes with [6] T. Chen, V. Fdkov, and S. Skiena. Identifying gene regulatory
some causal mechamsms of the observed system In gene expres- networks from expenmental data. In Proc. 3'rd Annual In-
sion context, we can model knockout/overexpressed mutants as ternational Conference on Computational Molecular B:ology
such interventsons. Thus, we can dessgn methods that deal wsth (RECOMB), 1999.
mtxed forms of data m a pnncipled manner (See [9] for a recent [7] D. M. Chlckenng. A transformational charactenzauon of
work in thss dsrecUon) In addmon, thss theory can provsde tools equivalent Bayesmn network structures. In Proc. Eleventh
for experimental design, that is, understanding wMch mterventsons Conference on Uncertainty m Arttfictal lntelhgence (UAI
are deemed most reformative to determmmg the causal structure m "95), pp 87-98. 1995.
the undedymg system. [8] D. M. Chickenng. Learmng Bayesian networks is NP-
complete. In D. Fssher and H.-J Lenz, editors, Learnmgfrom
Acknowledgements Data. Artificial Intelhgence and Stattstics V. Spnnger Verlag,
1996.
The authors are grateful to Gdl Bejerano, Hadar Benyammy, David
Engeiberg, Mosses Goldszmsdt, Daphne Koller, Matan Nmso, Itztk [9] G. Cooper and C. Yoo. Causal discovery from a mixture ofex-
Pe'er, and Gavin Sherlock for comments on drafts of thss paper and penmental and observational data. In Proc. Fifthteenth Con-
useful dtscusssons relatmg to thts work. We also thank Matan Nlmo ference on Uncertainty in Arttfic~al lntelhgence (UAI '99),
for help m running and analyzmg the robustness experiments. This pp. 116-125, 1999.
work was supported through the generossty of the Mschael Sacher [10] G. E Cooper and E. Herskovlts. A Bayesmn method for
Trust. the induction of probabdlstsc networks from data. Machme
Learnmg, 9:309-347, 1992.
References [11] E Cvrckova and K. Nasmyth. Yeast G1 cychns CLNI and
CLN2 and a GAP-like protem have a role sn bud formation
[1] S. Akutsu, T. Kuhara, O. Maruyama, and S. Mmyano. In- EMBO J, 12:5277-5286, 1993.
denuficauon of gene regulatory networks by strategic gene
[12] J. DeRisi., V. Iyer, and P. Brown. Explonng the metabohc
dssruptsons and gene over-expresssons. In Proc. Ninth An-
and genetic control of gene expression on a genomJc scale
Sctence, 282:699-705, 1997.
134
[13] M. A. Drebot, G C. Johnston, J. D Fnesen, and R. A. Singer [30] Kevln Murphy. Inference and learning in hybrid Bayesian net-
An impaired RNA polymerase II actwlty in saccharomyces works. Technical Report CSD-98-990, U C. Berkeley, 1998
cerevtstae causes cell-cycle mhibmon at START. Mol Gen [31 ] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Mor-
Genet, 241"327-334, 1993 gan Kaufmann, San Francisco, Cahf., 1988.
[14] B. Efron and R. J. Tlbshlram An Introduction to the Boot- [32] J. Pearl and T. S. Verma. A theory of mferred causation.
strap. Chapman & Hall, London, 1993. In Principles of Knowledge Representation and Reasonmg:
[15] M.B Elsen, PT Spellman, P.O Brown, and D. Botsteln Proc. Second International Conference (KR "91), pp. 441-
Cluster analysis and display of genome-wlde expression pat- 452 1991.
terns. Proc Nat. Acad. Sct. USA, 95:14863-14868, 1998. [33] R. Somogyl, S Fuhrman, M. Askenazl, and A. Wuensche.
[16] N Fnedman, M. Goldszmldt, and A. Wyner. Data anal- The gene expression matrix: Towards the extracuon of ge-
ysts with Bayesian networks. A bootstrap approach. In netic network arcMtectures In The Second Worm Congress
Proc. Ftfthteenth Conference on Uncertain~ in Artificial In- of Nonhnear Analysts (WCNA), 1996.
telhgence (UAI '99), pp 206-215, 1999. [34] E. L. Sonnhammer, S.R. Eddy, E. Bimey, A. Bateman, and
[17] N Fnedman, K. Murphy, and S Russell. Learning the R Durbln. Pfam: multiple sequence ahgnments and hmm-
structure of dynamic probablhstlc networks. In Proc. Four- profiles of protein domains. NucL Actds Res., 26.320-322,
teenth Conference on Uncertamty m Artificial lntelhgence 1998 http://pfam.wustl.edul.
(UAI '98), pp. 139-147 1998. [35] P.T Spellman, G. Sherlock, M.Q Zhang, V.R. Iyer, K. An-
[18] N. Fnedman, I. Nachman, and D. Pe'er. Learning Bayesian ders, M.B. Elsen, P.O. Brown, D Botstem, and B. Futcher
network structure from massive datasets: The "sparse candi- Comprehensive identification of cell cycle-regulated genes of
date" algorithm. In Proc F#hteenth Conference on Uncer- the yeast sacccharomyces cerevistae by mlcroarray hybndlza-
tainty m Artificial Intelligence (UAI '99), pp. 196-205, 1999. tlon Molecular Biology of the Cell, 9:3273-3297, 1998.
[19] N. Friedman and Z Yakhim. On the sample complexity of [36] P Splrtes, C. Glymour, and R. Scheines. Causation, predw-
learning Bayesian networks. In Proc. Twelfth Conference on tton, and search. Springer-Verlag, 1993.
Uncertamty m Artificial Intelligence (UAI '96), pp. 274-282. [37] S Tomalettl and E C. Hanawalt. Effect of DNA lesions on
1996 transcription elongation. Biochtmie, 81:139-146, 1999.
[20] V. Guaccl, D. Koshland, and A. Strunnlkov A direct hnk [38] D. Weaver, C. Workman, and G. Stormo. Modeling regulatory
between sister chromatld cohesion and chromosome conden- networks with weight matrices. In Pac. Symp. Btocomputing,
sation revealed through the analysis of MCD1 ms. cerev~siae. pp. 112-123, 1999.
Cell, 91(I).47-57, October 1997.
[39] X. Wen, S. Furhmann, G. S. Mlcheals, D. B. Cart, S. Smith,
[21] D. Heckerman, D. Geiger, and D M. Chlckenng Learning J. L. Barker, and R. Somogyl. Large-scale temporal gene
Bayesian networks: The combination of knowledge and sta- expression mapping of central nervous system development.
tistical data. Machine Learnmg, 20:197-243, 1995. Proc. Nat Acad. Sci. USA, 95:334-339, 1998.
[22] D Heckerman, C. Meek, and G. Cooper A Bayesian ap- [40] G. Yona, N. Lmial, and Lmlal M. Protomap - automated clas-
proach to causal discovery. Technical report, 1997. Technical slficauon of all protein sequences: a hierarchy of protein fam-
Report MSR-TR-97-05, Microsoft Research dies, and local maps of the protein space. Proteins: Structure,
[23] R. Hoffman and V. Tresp. Discovenng structure in continuous Function, and Genetics, 37:360-378, 1998.
variables using Bayesian networks. In Advances in Neural ln-
formatwn Processing Systems 8 (NIPS "96). MIT Press, 1996.
[24] V.R lyer, M B. Eisen, D.T. Ross, G. Schuler, T. Moore,
J.C.F. Lee, J.M. Trent, L.M. Staudt, J. Hudson, M.S. Boguski,
D Lashkan, D. Shalon, D. Botstein, and P.O. Brown. The
transcriptional program in the response of human fibroblasts
to serum. Sctence, 283:83-87, 1999.
[25] E V. Jensen An introduction to Bayesian Networks. Univer-
sity College London Press, London, 1996.
[26] D. Koller, U. Lerner, and D. Angelov. A general algorithm
for approxlamte inference and its application to hybrid Bayes
nets. In Proc. Fifthteenth Conference on Uncertainty in Arts-
fictal Intelhgence (UAI '99), pp. 324-333, 1999.
[27] D J. Lockhart, H. Dong, M. C. Byrne, M. T. Folletue, M V.
Gallo, M. S Chee, M. Mmmann, C. Want, M. KobayashJ,
H. Horton, and E. L. Brown. DNA expression momtonng by
hybridization of high density oligonucleotide arrays. Nature
Bwtechnology, 14:1675-1680, 1996
[28] W. G. McGregor DNA repmr, DNA replication, and UV mu-
tagenesls. J Investtg Dermatol Symp Proc, 4:1-5, 1999
[29] G S Mlchaels, D.B. Carl M. Askenazi, S. Fuhrman, X. Wen,
and R Somogyl. Cluster analysis and data vlsuahzatlon for
large scale gene expression data. In Pac. Syrup. Btocomputmg,
pp. 42-53 1998.
135