Using Bayesian Networks To Analyze Expression Data

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Using Bayesian Networks to Analyze Expression Data

Nir Friedman* Michal Linial t Iftach Nachman ~t Dana Pe'er~


Hebrew University
Jerusalem, 91904, ISRAEL

Abstract this machinery is played by proteins themselves, that brad to regu-


latory regions along the DNA, greatly affecting the transcription of
DNA hybridization arrays simultaneously measure the expression the genes they regulate.
level for thousands of genes These measurements provide a "snap- In recent years, technical breakthroughs m spotting hybndlza-
shot" of transcription levels w~thm the cell. A major challenge tton probes and advances m genome sequencing efforts lead to de-
in computattonal b~ology is to uncover, from such measurements, velopment of DNA microarrays, which consist of many species of
gene/protem interactions and key b~olog~cal features of cellular sys- probes, either ohgonucleotides or eDNA, that are immobihzed m a
tems. predefined organization to a solid phase. By usmg DNA mtcroar-
In th~s paper, we propose a new framework for &scovering in- rays researchers are now able to measure the abundance of thou-
teractions between genes based on multiple expression measure- sands of mRNA targets simultaneously [12, 27, 39] Unhke clas-
ments Th~s framework builds on the use of Bayestan networks sical experiments, where the expression levels of only a few genes
for representmg statlsttcal dependencies. A Bayesian network ~s a were reported, DNA m~croarray experiments can measure all the
graph-based model of joint multi-vanate probabihty dtstnbut~ons genes of an orgamsm, providing a "genomtc" viewpoint on gene
that captures propemes of conditional mdependence between van- expression. As a consequence, th~s technology faclhtates new ex-
ables. Such models are attractwe for their ability to descnbe com- perimental approaches for understanding gene expression and reg-
plex stochastic processes, and for prowdmg clear methodologies ulation [24, 35].
for learnmg from (noisy) observations. Early mlcroarray experiments examined few samples, and mainly
We start by showing how Bayesian networks can describe in- focused on differential display across tissues or condmons of in-
teractions between genes. We then present an efficient algorithm terest. The design of recent experiments focuses on performing a
capable of learning such networks and a stat~stical method to as- larger number of microarray experiments ranging m stze from a
sess our confidence m their features. Finally, we apply th~s method dozen to a few hundreds of samples. In the near future, data sets
to the S. cerevts~ae cell-cycle measurements of Spellman et al. [35] contammg thousands of samples will become available Such ex-
to uncover b~olog~cal features periments collect enormous amounts of data, which clearly reflect
many aspects of the underlying biological processes. An impor-
1 Introduction tant challenge Js to develop methodologies that are both staUstl-
cally sound and computatlonally tractable for analyzing such data
A central goal of molecular biology ~s to understand the regulation sets and referring biological interactions from them.
of protem synthesis and its reactions to external and internal s~g- Most of the analysis tools currently used are based on clus-
nals All the cells m an orgamsm carry the same genomlc data, tering algorithms. These algorithms attempt to locate groups of
but they protein makeup can be drasUcally different both tempo- genes that have similar expression patterns over a set of expen-
rally and spatmlly, due to regulation. Protein synthesis is regu- ments [2, 4, 15, 29, 35]. Such analysis has proven to be useful in
lated by many mechamsms at its &fferent levels These mclude dtscovenng genes that are co-regulated A more ambmous goal for
mechamsms for controlhng transcription lnltmt~on, RNA sphcmg, analys~s is reveahng the structure of the transcnpUonal regulauon
mRNA transport, translation mmaUon, post-translational mo&fica- process [1, 6, 33, 38]. This is clearly a hard problem. The current
tions, and degradation of mRNA/protem. One of the mare junctions data ~s extremely noisy. Moreover, mRNA expression data alone
at which regulation occurs is mRNA transcripuon. A major role in only gives a pamal picture that does not reflect key events such as
translauon and protein (re)activation. Finally, the amount of sam-
° School of Computer Science and Engineering, m r @ c s hu.ll ac fl pies, even m the largest experiments in the foreseeable future, does
tlnsutute of Life Sciences, m i c h a l l @ l e o n a r d o Is hu31 ac 11 not provide enough information to construct a full detatled model
*Center for Neural Computation & School of Computer Science and Engmeenng.
with high statistical slgmficance.
f f t a c h ~ c s hu31 ac fl
§School of Computer Science and Engineering, d a n a b @ c s huj~ ac fl
In this paper, we introduce a new approach for analyzing gene
expression patterns, that uncovers properties of the transcnpt~onal
Peml++s~on to m a k e thg~tal or hard copies o f all or pan o f this v~olk fol program by exammmg statistical properties of dependence and con-
personal or classroom use is ~qantcd v,,lthout fee p~o'~ ~ded that L.op~e-, dtttonal independence in the data. We base our approach on the
arc not m a d e or distributed t'ol profit ol covamelctal a d , , a n t a g c a~ad that well-studted statistical tool of Bayestan networks [311. These net-
cop~es beal this notice and the lull citation on the filst page 1o copy works represent the dependence structure between multiple inter-
otherwise, to repubh~h, to po~t on ~e~versoI to le(hstnhute to hst.',. acting quantttws (e g , expression levels of different genes). Our
~cqut~es prior specific pcrrmsslon and/or a lee approach, probabfltsttc in nature, is capable of handhng no~se and
RECOMB 2000 Tokyo Japan USA estimating the confidence in the different features of the network.
Copyright ACM 2000 t-58113-186-0/00/04 $5 00

127
We are therefore able to focus on interactions whose signal m the To specify a joint dJstnbution, we also need to specify the con-
data ~s strong. ditional probabilmes that appear in the product form. This is the
Bayesian networks are a promising tool for analyzing gene ex- second component of the network representation. This component
pression patterns. First, they are particularly useful for descnbmg describes distributions P(x,[pa(X,)) for each possible value x,
processes composed of locally interacting components; that is, the of X,, and p a ( X , ) of Pa(X,). In the case of finite valued van-
value of each component directly depends on the values of a rela- ables, we can represent these conditional distributions as tables.
twely small number of components. Second, statisacai foundations Generally, Bayesian networks are flexible and can accommodate
for learning Bayesmn networks from observations, and computa- many forms of conditional distnbution, including various continu-
tional algonthms to do so are well understood and have been used ous models.
successfully m many apphcauons. Finally, Bayesian networks pro- Given a Bayesian network, we might want to answer many
vide models of causal influence Although Bayesian networks are types of questions that involve the joint probability (e g , what is
mathematically defined stnctly m terms of probabdmes and con- the probablhty of X -- :v given observation of some of the other
ditlonai independence statements, a connectmn can be made be- variables 9) or independencies m the domain (e.g., are X and Y
tween this charactenzatmn and the notion of direct causal influ- independent once we observe Z?). The hterature contains a state
ence [22, 32, 36] of algonthms that can answer such queries (e.g., see [25, 31]), ex-
The remainder of this paper is orgamzed as follows. In Sec- ploiting the explicit representatmn of structure in order to answer
tmn 2, we review key concepts of Bayesmn networks, learning queries efficiently
them from observations, and using them to infer causality. In Sec-
tion 3, we descnbe how Bayesian networks can be applied to model 2.2 Equivalence Classes of Bayesian Networks
interactions among genes and discuss the technical issues that are
posed by tMs type of data. In Section 4, we apply our approach A Bayesian network structure G implies a set of independence as-
to the gene-expression data of Spellman et al. [35], analyzing the sumpuons in addmon to (*). Let Ind(G) be the set of independence
statistical significance of the results and their biological plausibil- statements (of the form X is independent of Y given Z ) that hold
ity. Finally, m Sectmn 5, we conclude with a discussion of related in all distributions satisfying these Markov assumptions These can
approaches and future work be denved as consequences of (*).
More than one graph can imply exactly the same set of inde-
pendencies. For example, consider graphs over two vanables X
2 Bayesian Networks
and Y. The graphs X -+ Y and X 4--- Y both imply the same set
2.1 Representing Distributions with Bayesian Networks of independencies (i.e., lnd(G) = 0). We say that two graphs G
and G' are equivalent if Ind(G) = Ind(G').
Consider a finite set ,l" = { X 1 , . . , X n } of random variables This notion of equivalence is cmoal, since when we examine
where each variable X, may take on a value x, from the domain observations from a distribution, we cannot distinguish between
VaI(X,) In this paper, we focus on finite domains, though much of equivalent graphs. Results of [7, 32] show that we can characterize
the following holds for infinite domains, such as continuous valued equwalence classes of graphs using a simple representation. In par-
random variables. We use capital letters, such as X , Y, Z, for vari- ticular, these results establish that equivalent graphs have the same
able names and lowercase letters x, y, z to denote specific values underlying undirected graph but might disagree on the direction of
taken by those vanables. Sets of vanables are denoted by boldface some of the arcs.
capital letters X , Y , Z , and assignments of values to the variables
in these sets are denoted by boldface lowercase letters x, y, z. We Theorem 2.1 [32] Two graphs are equivalent if and only if their
denote I ( X ; Y [ Z ) to mean X is independent of Y conditioned DAGs have the same underlymg undtrected graph and the same v-
on Z . structures (i.e. converging directed edges mto the same node, such
A Bayesian network is a representation of a joint probability as a --4 b +-- c ).
distnbution This representation consists of two components. The Moreover, an equivalence class of network structures can be
first component, G, is a directed acychc graph whose vertices cor- uniquely represented by a partially directed graph (PDAG), where
respond to the random variables X i , . . . , Xn. The second com- a directed edge X --+ Y denotes that all members of the equiv-
ponent descnbes a conditional distribution for each vanable, given alence class contain the arc X --+ Y; an undirected edge X - - Y
its parents m G. Together, these two components specify a unique denotes that some members of the class contain the arc X --+ Y,
distnbution on X 1 , . . . , Xn. while others contain the arc Y --+ X . Given a directed graph G,
The graph G represents conditional independence assumptions the PDAG representation of its equivalence class can be constructed
that allow the joint distnbution to be decomposed, economizing efficiently [7].
on the number of parameters. The graph G encodes the Markov
Assumption:
2.3 Learning Bayesian Networks
(*) Each variable X, is independent of its non-descendants, given
~ts parents m G. The problem of learning a Bayesian network can be stated as fol-
lows. Given a trainmg set D = {x 1, . . . , x N} of independent in-
By applying the chain role of probabflmes and properties of stances of X, find a network B = (G, O) that best matches D.
condmonal independencies, any joint distnbuUon that satisfies (*) (More precisely, we search for an equivalence class of networks
can be decomposed m the product form that best matches D.) The common approach to this problem is to
introduce a statistically motivated sconng function that evaluates
n
each network with respect to the training data, and to search for the
P ( X 1 , . . . , X, 0 = 1"I P ( X , IPa(X,)), optimal network according to this score.
t= l A commonly used sconng function is the Bayesian score (see [ 10,
21] for complete descnption):
where P a ( X , ) is the set of parents of X, in G. Figure 1 shows an
example of a graph G, lists the Markov independencies it encodes, S(G:D) = logP(GlD)
and the product form they Imply. = log P ( D I G) +log P ( G ) + C

128
Figure 1: An example of a s~mple Bayesmn network stmctat¢.
This network structure implies several conditional independence statements:
I(A; E), I(B, D I A, E), I(C, A, D, E I B), I(O; B, C, E I A), and I(E; A, D)
The network structure also imphes that t h e j o m t distribution has the product form

P(A, B, C, D, E) = P(A)P(BIA, E)P(CIB)P(DIA)P(E )

where C is a constant independent of G and for each node), the difference being it interprets the parents of a
variable as its immediate causes.
We can relate causal networks and Bayesian networks, by as-
P(D I C) = / P(D I c, O)P(O I G)dO suming the Causal MarkovAssumption: given the values of a vari-
able's immediate causes, it is independent of its earlier causes.
Is the marginal hkehhood which averages the probabdlty of the When the casual Markov assumption holds, the causal network sat-
data over all possible parameter assignments to G. The particular ~sfies the Markov independencies of the corresponding Bayesmn
choice of priors P(G) and P ( O [ G) for each G determines the network, thus allowing us to treat causal networks as Bayesmn net-
exact Bayesian score. Under mild assumptmns on the pnor prob- works For example, this assumpuon is a natural one in models of
abilities, this sconng metric is asymptotically consistent: Gwen a genetic pedigrees, once we know the genetm makeup of the indi-
sufficiently large number of samples, graph structures that exactly vidual's parents the genetic makeup of her ancestors are not infor-
capture all dependencies m the distribution, will mcewe, with h~gh mative about her own genetic makeup.
probability, a higher score than all other graphs [19] This means, The main difference between causal and Bayesmn networks, is
that given a sufficiently large number of instances in large data sets, that a causal network models not only the d~stnbutmn of the ob-
learning procedures can pinpoint the exact network structure up to servatmns, but also the effects of interventions. If X causes Y,
the correct eqmvalence class. then mampulatmg the value of X 0.e., setting it to another value
Heckerman et al. [21] present a family of priors, called BDe in such a way that the manipulation itself does not affect the other
priors, that satisfy two ~mportant requirements: First, these priors variables), affects the value of Y. On the other hand, if Y IS a cause
are structure equivalent, i.e., if G and G ' are equivalent structures of X , then manipulating X will not affect Y. Thus, although the
they are guaranteed to have the same score. Second, the priors are Bayesian networks X --+ Y and X 6-- Y are equivalent, as causal
decomposable. That is, the score can be rewritten as the sum networks they are not.
When can we learn a causal network from observations? This
SBDe(G : D ) = ~ ScoreContributlonnDe(X,, P a ( X , ) : D ) , issue received a thorough treatment m the hterature [22, 32, 36]
From observauons alone, we cannot distinguish between causal
networks that specify the same independence assumptions, l.e, be-
where the contnbutmn of every variable X , to the total network long to the same eqmvalence class. When learning an eqmvalence
score depends only on its own value and the values of its parents class (PDAG) from the data, we can conclude that the true causal
in G. These two properties are sausfied for BDe pnors when all network is possibly any one of the networks in this class If a di-
instances x t in D are complete--that IS, they assign values to all rected arc X --+ Y is in the PDAG, then all the networks in the
the vanables in ,1=" eqmvalence class agree that X is an immediate cause of Y. Thus,
Once the prior is specified (we use an un-mformauve prior in we infer the causal direction of the interactmn between X and Y
our experiments) and the data is given, learning amounts to finding We stress that we can infer such causal relations without any exper-
the structure G that maximizes the score. This problem Is known to imental lnterventmn (e.g. knockout and over-expressions) among
be NP-hard [8], thus we resort to heuristic search. The decompo- our samples.
stuon of the score ~s crucial for this optlmlzatmn problem. A local
search procedure that changes one arc at each move can efficiently 3 Applying Bayes|an Networks to Expression Data
evaluate the gains made by adding, removing or reversing a single
arc An example of such a procedure is a greedy hdl-chmblng algo- in this section we describe our approach to analyzing gene expres-
rithm that at each step performs the local change that results in the sion data using Bayesian network learning techniques. We model
maximal gain, until it reaches a local maximum. Although this pro- the expression level of each gene as a random vanable. In addition,
cedure does not necessanly find a global maximum, it does perform other attributes that affect the system can be modeled as random
well in practice. Examples of other search methods that advance us- vanables. These can include a variety of attnbutes of the sam-
ing one-arc changes include beam-search, stochastic hill-climbing, ple, such as experimental conditions, temporal indicators (i.e., the
and simulated annealing. time/stage that the sample was taken from), background variables
(e g., which clinical procedure was used to get a biopsy sample),
2.4 Learning Causal Patterns and exogenous cellular conditions.
By learning a Bayesian network based on the statistical depen-
A Bayesian network is a model of dependencies between multiple dencies between these vanables, we can answer a wide range of
measurements. We are also interested in modeling the process that quenes about the system. For example, does the expression level
generated these dependencms Thus, we need to model the flow of a particular gene depend on the expenmental condition? Is this
of causahty in the system of interest (e.g., gene transcription). A dependence direct, or indirect? If it IS indirect, which genes medi-
causal network is a model of such causal processes A causal net- ate the dependency 9 We now describe how one can learn such a
work is similar to a Bayesmn network (1 e , a DAG where each node model from the gene expression data. Many important issues arise
represents a random variable along with a local probablhty model when learning in this domain. These revolve statistical aspects of

129
interpreting the results, algorithmic complexity ~ssues m learning the bootstrap is simple. We generate "perturbed" versions of our
from the data, and preprocessmg the data. original data set, and learn from them. In this way we collect many
Most of the difficulties m learning from expression data revolve networks, all of which are fairly reasonable models of the data
around the fotlowmg central point: Contrary to previous apphca- These networks show how small perturbations to the data can ef-
t~ons of learning Bayesian networks, expression data mvolves tran- fect many of the features.
script levels of thousands of genes while current data sets contam In our context, we use the bootstrap as follows:
at most a few dozen samples. This raises problems m computa-
tional complexity and the statisucal stgmficance of the resulting • For z = 1. m (m our experiments, we set m = 200).
networks. On the posmve s~de, genetic regulation networks are
sparse, Le, g~ven a gene, ~t is assumed that no more than a few - Re-sample with replacement, N instances from D. De-
note by D, the resulting dataset.
dozen genes d~rectly affect ~ts transcription. Bayesmn networks are
especmlly stated for learning in such sparse domains. - Apply the learning procedure on D, to reduce a net-
work structure (~,
3.1 Representing Partial Models • For each feature f of interest calculate
When leammg models with many variables, small data sets are m
not sufficiently mformatwe to sJgmficantly determine that a single
model is the "right" one. Instead, many different networks should
conf(f) = 1 ~ f(c.)
be considered as reasonable explanation of the given the data. From
a Bayesian perspecttve, we say that the posterior probability over where f(G) is 1 if f is a feature m G, and 0 otherwise
models ]s not dominated by a single model (or eqmvalence class
of models)) Our approach Is to analyze this set of plausible (i.e., We refer the reader to [16] for more details, as well as large-scale
high-scoring) networks. Although this set can be very large, we simulation experiments with this method. These simulauon exper-
m~ght attempt to characterize features that are common to most of Iments show that features induced with high confidence are rarely
these networks, and focus on learning them. false positives, even m cases where the data sets are small com-
Before we examine the ~ssue of referring such features, we pared to the system being learned. This bootstrap procedure ap-
briefly discuss two classes of features mvolwng pmrs of variables. pears especmlly robust for the Markov and order features described
While at th~s point we handle only pmrwise features, ~t ~s clear that m section 3.1.
th~s analys~s ~s not restricted to them, and in the future we are plan-
ning on exammmg more complex features. 3.3 Efficient Learning Algorithms
The first type of features ~s Markov relations: Is Y in the Markov
blanket of X ? The Markov blanket of X is the minimal set of vari- In secUon 2.3, we formulated learning Bayesian network structure
ables that shield X from the rest of the variables in the model. as an optimization problem in the space of directed acychc graphs.
More precisely, A- g~ven its Markov blanket ~s independent from The number of such graphs is super-exponential m the number of
the remaining variables m the network. It ~s easy to check that th~s variables. As we consider hundreds & thousands of variables, we
relation is symmetric: Y ~s m X ' s Markov blanket ff and only ff must deal with an extremely large search space. Therefore, we need
there ~s e~ther an edge between them, or both are parents of another to use (and develop) efficient search algorithms.
variable [31 ]. In the context of gene expressmn analys~s, a Markov To facihtate efficient learning, we need to be able to focus the
relaUon indicates that the two genes are related m some jomt b~o- attention of the search procedure on relevant regions of the search
logical interaction or process. Note, two variables in a Markov re- space, giving rise to the Sparse Candidate algorithm [18]. The
lation are d~rectly hnked m the sense that no variable m the model main ]dea of this techmque is that we can idenUfy a relatively small
medmtes the dependence between them. It remains possible that an number of can&date parents for each gene based on simple local
unobserved vanable (e g., protein activation) ~s an intermedmte in statlsUcs (such as correlation). We then restrict our search to net-
their interaction. works m which only the candidate parents of a variable can be its
The second type of features is orderrelattons. Is X an ancestor parents, resulting m a much smaller search space in which we can
of Y m all the networks of a gwen equwalence class? That ~s, hope to find a good structure qmckly.
does the gwen PDAG contain a path from X to Y m which all the A possible pitfall of this approach ]s that early choices can re-
edges are directed? Th~s type of feature does not revolve only a sult m an overly restricted search space. To avoid this problem, we
close neighborhood, but rather captures a global property. Recall devised an iteratwe algorithm that adapts the canthdate sets d u n n g
that under the assumptions of Section 2.4, leammg that X is an search. At each iteration n, for each variable X , , the algorithms
ancestor of Y would ~mply that X ~s a cause of Y. However, these chooses the set (7~ = {Y1,.. •, Yk } of variables which are the
assumpuons do not necessarily hold m the context of expression most promising can&date parents for X~. We then search for B,~,
data. Thus, we v~ew such a relation as an indication, rather than an opumal network m which P a c~ ( X , ) C_ C~. The network found
evidence, that X might be a causal ancestor of Y. ]s then used to grade the selectmn of better candidate sets for the
next iteration. We ensure that B,~ monotomcally improves m each
3.2 Estimating Statistical Confidence in Features iteration by requinng P a c'~-1 ( X , ) C_ C~. The algorithm contin-
ues until there Is no change m the candidate sets.
We now face the following problem. To what extent do the data We briefly outhne our method for choosing C~. We assign
support a given feature? More precisely, we want to estimate a mea- each X 3 some score of relevance to X , , choosing variables with
sure of confidence m the features of the learned networks, where the highest score. A natural score that measures the dependence be-
"confidence" approximates the hkehhood that a gwen feature lS ac- tween two variables is their mutual mformatlon denoted I(X; Y).
tually true (i.e. is based on a genuine correlation and causat]on). The following is an example that arises with such a score: Consider
An effective, and relatwely s]mple, approach for estimatmg the network m Figure 1 If I(B; D) > I(B; E), for k = 2, E wdl
confidence is the bootstrap method [14]. The mare ~dea behmd be left out of C ~ . Since A mediates the dependence between B
and D, the network learned m this iteration wdl contain only A as
1This observation ]s not umque to Bayesmn network models It equally well apphes
to other models that are learned from gene expression data, such as clustenng models
B ' s parent. We can use th~s conditional mdependence to improve

130
4 Application to Cell Cycle Expression Patterns

We applied our approach to the data of Spellman et al [35], con-


tammg 76 gene expression measurements of the mRNA levels of
6177 S. cerevtsiae ORFs. These experiments measure six time
series under different cell cycle synchromzation methods. Spell-
man et al. [35] identified 800 genes whose expression vaned over
the different cell-cycle stages. Of these, 250 clustered into 8 dis-
tinct clusters based on the simdanty of expression profiles. We
learned networks whose variables were the expression level of each
of these 800 genes. Some of the robustness analysis was performed
only on the set of 250 genes that appear m the 8 major clusters.
In learning from this data, we treat each measurement as a sam-
ple from a dlstnbut~on, and do not take into account the tempo-
ral aspect of the measurement Since it ~s clear that the cell cycle
process is of temporal nature, we compensate by introducing addi-
uonal vanable denotmg the cell cycle phase ThJs vanable is forced
to be a root m all the networks learned. Its presence allows to model
dependency of expression levels on current cell cycle. 2
We used the Sparse Candidate algorithm with a 200-fold boot-
strap In the learnmg process. The learned features show that we
can recover lntncate structure even from such small data sets It
~s ~mportant to note that our learmng algonthm uses no prior bi-
Figure 2. An example of the graphical d~splay of Markov features. ological knowledge nor constraints. All learned networks and
Th~s graph shows a "'local map" for the gene SVS1 The width relauons are based solely on the information conveyed m the mea-
(and color) of edges corresponds to the computed confidence level. surements themselves. These results are avadable at our W W W
An edge Js d~rected ff there ~s a sufficiently h~gh confidence m the site: h t t p : / / w w w . c s . h u j i . a c i l / l a b s / c o m p b i o / e x p r e s s i o n . Fig-
order between the pmr genes connected by the edge. This local ure 2 tllustrates the graphical display of results of this analysis
map shows that CLN2 seperates SVS 1 from several other genes.
Although there ~s a strong connection between CLN2 to all these
genes, there are no other edges connectmg them. Th~s indicates 4,1 RobustnessAnalysis
that, w~th h~gh confidence, these genes are conditionally Jndepen- We performed a number o f tests to analyze the statistical slgmfi-
dent gwen the expression level of CLN2. cance and robustness of our procedure. We c a m e d most of these
tests on the smaller 250 gene data set for computational reasons.
To test the credibility of our confidence assessment, we created
our candidate sets A better score is the condmonal mutual mfor- a random data set by randomly permuting the order of the expen-
matron, I ( X , ; X~[Pa c " - 1 ( X , ) ) . The score we actually use is an ments independently for each gene. Thus for each gene the order
estimator of the conditional mutual reformation in the underlying was random, but the composition of the senes remained unchanged
distnbut~on, that takes into account also the number of parameters In such a data set, genes are independent of each other, and thus we
needed to learn X , 's condmonal probabtllty. do not expect to find "real" features. As expected, both order and
We refer the reader to [18] for more details on the algonthm Markov relations m the random data set have significantly lower
and ~ts complexity, as well as empmcal results c o m p a n n g ~ts per- confidence We compare the distribution of confidence estimates
formance to traditional search techmques. between the original data set and the randomized set in Figure 3
Clearly, the dlstnbution of confidence estimates m the ongmal data
3.4 Discretization set have a longer and heavier tall m the high confidence region The
runs on the random data sets do not learn almost anything with a
In order to specify a Bayesmn network model, we sttll need to de- confidence level above 0 8, which leads us to believe that most fea-
fine the local probabd~ty model for each variable. At the current tures that are learned m the original data set with such confidence
stage, we choose to focus on the quahtatwe aspects of the data, levels originate m true signals in the data. Also, the confidence dls-
and so we d~screUze gene expression values into three categories: tnbut~on for the real dataset is concentrated closer to zero than the
- 1 , 0, and 1, depending whether the expression rate is slgmficantly random distribution. This suggests that the networks learned from
lower than, slmdar to, or greater than the respectwe control. The the real data are sparser.
control expression level of a gene can be either determined expen- Since the analysis was not performed on the whole S cerevtstae
mentally (as m the methods of [I 2]), or it can be set as the average genome, we also tested the robustness of our analysis to the addi-
expression level of the gene across experiments The meaning of tion of more genes, c o m p a n n g the confidence of the leamed fea-
"s~gmficantly" ~s defined by setting a threshold to the ratio between tures between the 250 and 800 gene datasets Figure 4 compares
measured expression and control In our experiments we choose a feature confidence in the analysis of the two datasets. As we can
threshold value of 0 5 m loganthm~c (base 2) scale. see, there is a strong correlation between confidence levels of the
It is clear that by dlscretlzlng the measured expression levels features between the two data sets.
we are loosing mformat~on. An altematJve to d~scret~zaaon ~s us- A crucial choice in our procedure is the threshold level used for
mg (sem0parametnc density models for representmg condmonal d~scretlzat~on of the expression levels. It is clear that by setting a
probablhtles m the networks we leam (e g [23, 26, 30]) However, different threshold, we would get different discrete expression pat-
a bad choice of the parametric family can strongly bias the learn- terns. Thus, ~t is important to test the robustness and sensitivity of
mg algonthm. We beheve that dlscreuzatlon provides a reasonably
unbiased approach for deahng with this type of data. We are cur- 2 We note that we can learn temporal models using a Bayesian network that includes
rently explonng the appropnateness of several density models for gene expression values m two (or more) consecuuve Ume points [17] This raises the
number of variables in the model We are currently perusing this tssue
thJs type of data

131
Order Markov

Jk
Ongmal set Random set Ongmal set Random set
ao~0

~2c~ roo~l

I e0c~l
Fr.o0 2500I
"~i

5001
L
o2 O~ oe o~ 0'2 014 O~ 08 Ot 02 0~3 O~i OS 0* 07 Oi
50

¢S

40

35

ao

2S
2O

Im
10
s

o
03 04 OS O~ 07
._J
08 Ot *
10

o~illlol n ~..., o#
.....
07 ol ol

Figure 3: Histograms of confidence levels for the cell cycle data set, and the randomized data set. The histograms on the left are of order
relations, and the ones on the nght are of Markov relations. The histograms on the top row show the distribution of confidence levels in the
interval [0, 1] The histograms on the bottom row show the tails of these dlstnbuuons for high-confidence features. These histograms are all
based on the 250 genes data set.

the high confidence features to the choice of this threshold. This Order relations Markov relations
1.
was tested by repeating the experiments using different threshold
levels. Again, the graphs show a definite linear tendency in the
confidence estimates of features between the different discretiza-
tlon thresholds. Obviously, this linear correlation gets weaker for o, l
oe~-
larger threshold differences. We also note that order relations are
much more robust to changes in the threshold than the Markov re-
lations E "

A valid cnticlsm of our discretizauon method is that it penal-


izes genes whose natural range of variation is small, since we use
a fixed threshold, we would not detect changes in such genes A ol '
possible way to avoid this problem is to normalize the expression () 02 0~- o~ 08
of genes in the data. That is, we rescale the expression level of
each gene, so that the relative expression all genes have the same
Figure 4: C o m p a n s o n between sigmficance levels with different
mean and vanance We note that analysis methods that use pear- number of genes m the analysis. Each relation is shown as a point,
son correlation to compare genes, such as [4, 15], are lmphcltly
with the :c-coordinate being its confidence in the the 250 genes data
performing such a normalization. 3 When we dlscretize a normal-
set and the V-coordinate the confidence in the 800 genes data set
ized dataset, we are essentially rescahng the discretlzaUon factor
The left figure shows order relation features, and the n g h t figure
differently for each gene, depending on its variance in the data. We shows Markov relation features;.
tried this approach with several dlscreUzatlon levels, and got re-
sults comparable to our original dlscreUzation method. The 20 top
Markov relations highlighted by this method were a bit different, pecially order relations) are stable across the different experiments
but interesting and biologically sensible in their own nght. The or- discussed m the previous paragraph, it Js clear that our analysis is
der relations were again more robust to the change of methods and sensitive to the dzscreUzation method. In all the different dlscretlza-
discretizaUon thresholds. A possible reason is that order relations uon methods we tried, our analysis found interesting relationships
depend on the network structure in a global manner, and thus can m the data. Thus, the challange is to find alternative methods that
remain intact even after many local changes to the structure. The can recover all these relationships m one analysis We are cur-
Markov relation, being a local one, is more easily disrupted. Since rently workmg on learning with (seml)parametnc density models
the graphs learned are extremely sparse, each discretization method that would circumvent the need for dlscretlzatlon
"highlights" different signals in the data, which are reflected in the
Markov relations learned.
In summary, although many of the results we report below (es- 4.2 Biological Analysis

3 A n u n d e s i r e d effect o f s u c h a n o r m a l i z a t i o n is the a m p l i f i c a t i o n o f m e a s u r e m e n t
We beheve that the results of this analysis can be mdlcatwe of bi-
n o i s e I f a g e n e h a s fixed e x p r e s s i o n l e v e l s a c r o s s s a m p l e s , w e e x p e c t the v a r i a n c e ological phenomena in the data. This is confirmed by our abdzty
in m e a s u r e d e x p r e s s i o n l e v e l s to b e n o i s e e i t h e r i n the e x p e r i m e n t a l c o n d i t i o n s o r the to predict sensible relations between genes of known function. We
measurements W h e n w e n o r m a l i z e the e x p r e s s i o n l e v e l s o f g e n e s , w e l o o s e the now examine several consequences that we have learned from the
distinction betweensuch noise and true (1e, significant) changesin expression levels
In our experiments,we can safely assume this effect will not be too grave, since we data. We consider, m turn, the order relations and Markov relations
only focus on genesthat displaysignificantchangesacross experiments found by our analysis.

132
Table 1: L~st of dominant genes in the ordenng relations (top 14 out of 30)

Gene/ORF Dominance # ofdescendentgenes


Score >.8 >7 notes
YLR 183C 551 6O9 708 Contains forkheaded assoslated domain, thus possibly nuclear
MCDI 550 599 710 Mitotic Chromosome Determinant,null mutant is inviable
CLN2 497 495 654 Role m cell cycle START, null mutant exhibits GI arrest
SRO4 463 405 639 Involved m cellular polarization dunng budding
RFA2 456 429 617 Involved m nucleoude excismn repair, null mutant is inviable
YOL007C 444 367 624
YOX 1 400 243 556 Homeodomam protein
GAT3 398 309 531 Putatwe GATA zinc finger transcnption factor related to polll
transcnptmn
POL30 376 173 520 Required for DNA replication and repair, null mutant is inviable
RSRI 352 140 461 GTP-bmdmg protein of the ras fanuly involved in bud site selection
CLN 1 324 74 404 Role in cell cycle START, null mutant exhibits GI arrest
YBR089W 298 29 333
MSH6 284 7 325 Required for mismatch repair in mitosis and meiosis

4.2.1 Order Relations unknown pairs are physically adjacent on the chromosome, and
thus presumably regulated by the same mechanism (see [5]), al-
The most sinking feature of the high confidence order relations, though s p e o a l care should be taken for pairs whose chromosomal
is the existence of dominant genes. Out of all 800 genes only location overlap on complementary strands, since in these cases we
few seem to dominate the order 0.e., appear before many genes). might see an amfact resulting from cross-hybndlzatlon. Such anal-
The mtumon is that these genes are ind~catwe of potentml causal ysis raises the number of biologically sensible pairs to 19/20.
sources of the cell-cycle process. Let Co(X, Y) denote the confi- There are some interesting Markov relations found that are be-
dence m X being ancestor of Y We define the dommance score yond the hmltatlons of clustenng techniques. One such regulatory
of X as ~-~r, Co(X,r)>, Co(X, y)k, usmg the constant k for re- hnk is F A R I - A S H I : both protems are known to p a m o p a t e in a
warding h]gh confidence features and the threshold t to discard low matmg type sw~tch. The correlation of their expression patterns
confidence ones. These dominant genes are extremely robust to is low and [35] cluster them into different clusters. Among the
parameter selection for both t, k and the thscret~zatlon cutoff of high confidence markov relations, one can also find examples of
section 3 4 A list of the Mghest scoring dominating genes appears condmonal indpendence, i.e., a group of highly correlated genes
m table 1 whose correlaUon can be explained within our network stucture.
Inspecuon of the hst of dommant genes reveals qmte a few In- One such example revolves the genes: CLN2,RNR3,SVS 1,SRO4
teresting features. Among the dominant genes are those directly and RAD41, their expression is correlated, in [35] all appear in the
revolved m cell-cycle control and mmauon. For example, C L N I , same cluster. In our network CLN2 is with h~gh confidence a par-
CLN2 and CDCS, whose functional relation has been estabhshed [11, ent of each of the other 4 genes, while no hnks are found between
13]. Other genes, hke MCD1 and RFA2, were found to be es- them. TMs suits biological knowledge: CLN2 is a central and early
sentml [20]. These are clearly key genes in bas]c cell functions, cell cycle control, while there is no clear biologmal relationsMp
revolved m chromosome dynam]cs and stabihty (MCD1) and m between the others.
nucleotlde exoslon repmr (RFA2). Most of the dominant genes en-
code nuclear protems, and some of the unknown genes are also po-
tentially nuclear: (e.g, YLR183C contmns a forkhead-associated 5 Discuslon and Future Work
domain which is found almost enarely among nuclear protems).
In tMs paper we presented a new approach for analyzing gene ex-
Some of them are components of pre-rephcat]on complexes. Oth-
pression data that builds on the theory and algorithms for learning
ers (hke RFA2,POL30 and MSH6) are involved in DNA repmr. It
Bayesian networks. We described how to apply these techniques to
is known that DNA repmr Is a prerequisite for transcnpaon, and
DNA areas which are more active m transcnpt]on, are also repaired gene expression data. The approach builds on two techmques that
more frequently [28, 37] were motivated by the challanges posed by this domain: a novel
A few non nuclear dominant genes are locahzed m the cyto- search algorithm [18] and an approach for estimating statistical
confidence [16]. We apphed our methods to real expression data
plasm membrane (SRO4 and RSR1) These are involved m the
buddmg and sporulat]on process which have an important role m of Speliman et al. [35]. Although, we did not use any pnor knowl-
the cell-cycle. RSR1 belongs to the ras family of proteins, which edge, we managed to extract many b]ologically plausible conclu-
sions from this analysis
are known as initiators of s~gnal transductlon cascades in the cell.
Our approach ts quite different than the clustenng approach
used by [2, 4, 15, 29, 35], m that it attempts to learn a much ncher
4.2.2 Markov Relations structure from the data. Our methods are capable of thscovenng
Inspection of the top Markov relations reveals that most are func- causal relationships, mteracaons between genes other than posi-
t~onaly related A hst of the top sconng relations can be found m tive correlation, and finer mtra-cluster structure. We are currently
table 2. Among these, all mvolwng two known genes make sense developing hybrid approaches that combine our methods with clus-
biologically. When one of the ORFs is unknown careful searches tenng algorithms to learn models over "clustered" genes
using Psi-Blast [3], Pfam [34] and Protomap [40] can reveal firm The biological motwation of our approach is similar to work
homologies to protems functionally related to the other gene in the on inducing geneuc networks from data [1, 6, 33, 38] There are
pair (e.g. YHRI43W, which ~s pmred to the endochitmase CTSI, two key differences: First, the models we learn have probabhstlc
]s related to EGT2 - a cell wall mamtence protem) Several of the semantics This better fits the stochastic nature of both the bio-
logical processes and noisy experimentation. Second, our focus ~s

133
Table 2: Ltst of top Markov relauons

Confidence Gene I Gene 2 notes


10 YKL 163W-PIR3 YKL164C-PIRI Close locahty on chromosome
0 985 PRY2 YKR012C Close locahty on chromosome
0985 MCDi MSH6 Both brad to DNA during mltosts
o98 PHOI I PHOI2 Both nearly Identical acid phosphatases
0 975 HHT! HTB 1 Both are Histones
O97 HTB2 HTAI Both are H~stones
0 94 YNL057W YNL058C Close locahty on chromosome
0 94 YHRI43W CTSI Homolog to EGT2 cell wall control, both revolved m Cytokmests
0 92 YOR263C YOR264W Close Iocahty on chromosome
091 YGR086 S1CI Homolog to mammahan nuclear ran protem, both revolved m nuclear function
09 FAR I ASH 1 Both part of a mating type swstch, expression un¢orelated
0 89 CLN2 SVS I Function of SVS i unknown
088 YDR033W NCE2 Homolog to transmembrame proteins suggest both revolved m protein secreUon
0 86 STE2 MFA2 A mating factor and receptor
0 85 HHFI HHF2 Both are Hlstones
085 METI0 ECMI7 Both are sulfite reductases
085 CDC9 RAD27 Both participate in Okazakl fragment processmg

on extracting features that are pronounced tn the data, in contrast nual ACM-SIAM Sympostum on Discrete Algorithms. ACM-
to current genetm network approaches that attempt to find a single SLAM, 1998.
model that explams the data. [2] U. Alon, N. Barkai, D.A. Notterman, K. Gssh, S. Ybarra,
We are currently workmg on improving methods for expres- D. Mack, and A. J. Levme. Broad patterns of gene expression
sion analys~s by expanding the framework described m tfus work. revealed by clustenng analysis of tumor and normal colon tss-
Promising d~rect~ons for such extentions are: (a) Developing the sues probed by oligonucleoude arrays Proc. Nat. Acad. Sct.
theory for leammg local probabdtty models that are capable of USA, 96:6745-6750, 1999.
deahng wtth the continuous nature of the data; (b) lmprovmg the
theory and algorithms for esumatmg confidence levels, (c) Incor- [3] S. Altschul, L. Thomas, A. Schaffer, Z. Zhang, J. Zhang,
poratmg btoiogical knowledge (such as posssble regulatory regtons) W. Miller, and D. Lipman. Gapped blast and psi-blast: a
as prior knowledge to the analysts; (d) Improving our search heuris- new generation of protein database search programs. Nucletc
tins; (e) Applymg Dynamic Bayestan Networks ([17]) to temporal Acids Res, 25, 1997
expression data. [4] A. Ben-Dor, R. Shamir, and Z YakhinL Clustenng gene ex-
Finally, one of the most excmng longer term prospects of tMs pression patterns. Journal of Computatwnal Bzology, 6:281-
ime of research ts dtscovering causal patterns from gene expres- 297, 1999.
ston data We plan to build on and extend the theory for learning [5] T. Blumenthal. Gene clusters and polyclstromc transcription
causal relattons from data and apply st to gene expresston. The m eukaryotes. Bioessays, pp. 480-487, 1998.
theory of causal networks allows learning both from observattonal
data and mterventwnal data, where the experiment intervenes with [6] T. Chen, V. Fdkov, and S. Skiena. Identifying gene regulatory
some causal mechamsms of the observed system In gene expres- networks from expenmental data. In Proc. 3'rd Annual In-
sion context, we can model knockout/overexpressed mutants as ternational Conference on Computational Molecular B:ology
such interventsons. Thus, we can dessgn methods that deal wsth (RECOMB), 1999.
mtxed forms of data m a pnncipled manner (See [9] for a recent [7] D. M. Chlckenng. A transformational charactenzauon of
work in thss dsrecUon) In addmon, thss theory can provsde tools equivalent Bayesmn network structures. In Proc. Eleventh
for experimental design, that is, understanding wMch mterventsons Conference on Uncertainty m Arttfictal lntelhgence (UAI
are deemed most reformative to determmmg the causal structure m "95), pp 87-98. 1995.
the undedymg system. [8] D. M. Chickenng. Learmng Bayesian networks is NP-
complete. In D. Fssher and H.-J Lenz, editors, Learnmgfrom
Acknowledgements Data. Artificial Intelhgence and Stattstics V. Spnnger Verlag,
1996.
The authors are grateful to Gdl Bejerano, Hadar Benyammy, David
Engeiberg, Mosses Goldszmsdt, Daphne Koller, Matan Nmso, Itztk [9] G. Cooper and C. Yoo. Causal discovery from a mixture ofex-
Pe'er, and Gavin Sherlock for comments on drafts of thss paper and penmental and observational data. In Proc. Fifthteenth Con-
useful dtscusssons relatmg to thts work. We also thank Matan Nlmo ference on Uncertainty in Arttfic~al lntelhgence (UAI '99),
for help m running and analyzmg the robustness experiments. This pp. 116-125, 1999.
work was supported through the generossty of the Mschael Sacher [10] G. E Cooper and E. Herskovlts. A Bayesmn method for
Trust. the induction of probabdlstsc networks from data. Machme
Learnmg, 9:309-347, 1992.
References [11] E Cvrckova and K. Nasmyth. Yeast G1 cychns CLNI and
CLN2 and a GAP-like protem have a role sn bud formation
[1] S. Akutsu, T. Kuhara, O. Maruyama, and S. Mmyano. In- EMBO J, 12:5277-5286, 1993.
denuficauon of gene regulatory networks by strategic gene
[12] J. DeRisi., V. Iyer, and P. Brown. Explonng the metabohc
dssruptsons and gene over-expresssons. In Proc. Ninth An-
and genetic control of gene expression on a genomJc scale
Sctence, 282:699-705, 1997.

134
[13] M. A. Drebot, G C. Johnston, J. D Fnesen, and R. A. Singer [30] Kevln Murphy. Inference and learning in hybrid Bayesian net-
An impaired RNA polymerase II actwlty in saccharomyces works. Technical Report CSD-98-990, U C. Berkeley, 1998
cerevtstae causes cell-cycle mhibmon at START. Mol Gen [31 ] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Mor-
Genet, 241"327-334, 1993 gan Kaufmann, San Francisco, Cahf., 1988.
[14] B. Efron and R. J. Tlbshlram An Introduction to the Boot- [32] J. Pearl and T. S. Verma. A theory of mferred causation.
strap. Chapman & Hall, London, 1993. In Principles of Knowledge Representation and Reasonmg:
[15] M.B Elsen, PT Spellman, P.O Brown, and D. Botsteln Proc. Second International Conference (KR "91), pp. 441-
Cluster analysis and display of genome-wlde expression pat- 452 1991.
terns. Proc Nat. Acad. Sct. USA, 95:14863-14868, 1998. [33] R. Somogyl, S Fuhrman, M. Askenazl, and A. Wuensche.
[16] N Fnedman, M. Goldszmldt, and A. Wyner. Data anal- The gene expression matrix: Towards the extracuon of ge-
ysts with Bayesian networks. A bootstrap approach. In netic network arcMtectures In The Second Worm Congress
Proc. Ftfthteenth Conference on Uncertain~ in Artificial In- of Nonhnear Analysts (WCNA), 1996.
telhgence (UAI '99), pp 206-215, 1999. [34] E. L. Sonnhammer, S.R. Eddy, E. Bimey, A. Bateman, and
[17] N Fnedman, K. Murphy, and S Russell. Learning the R Durbln. Pfam: multiple sequence ahgnments and hmm-
structure of dynamic probablhstlc networks. In Proc. Four- profiles of protein domains. NucL Actds Res., 26.320-322,
teenth Conference on Uncertamty m Artificial lntelhgence 1998 http://pfam.wustl.edul.
(UAI '98), pp. 139-147 1998. [35] P.T Spellman, G. Sherlock, M.Q Zhang, V.R. Iyer, K. An-
[18] N. Fnedman, I. Nachman, and D. Pe'er. Learning Bayesian ders, M.B. Elsen, P.O. Brown, D Botstem, and B. Futcher
network structure from massive datasets: The "sparse candi- Comprehensive identification of cell cycle-regulated genes of
date" algorithm. In Proc F#hteenth Conference on Uncer- the yeast sacccharomyces cerevistae by mlcroarray hybndlza-
tainty m Artificial Intelligence (UAI '99), pp. 196-205, 1999. tlon Molecular Biology of the Cell, 9:3273-3297, 1998.
[19] N. Friedman and Z Yakhim. On the sample complexity of [36] P Splrtes, C. Glymour, and R. Scheines. Causation, predw-
learning Bayesian networks. In Proc. Twelfth Conference on tton, and search. Springer-Verlag, 1993.
Uncertamty m Artificial Intelligence (UAI '96), pp. 274-282. [37] S Tomalettl and E C. Hanawalt. Effect of DNA lesions on
1996 transcription elongation. Biochtmie, 81:139-146, 1999.
[20] V. Guaccl, D. Koshland, and A. Strunnlkov A direct hnk [38] D. Weaver, C. Workman, and G. Stormo. Modeling regulatory
between sister chromatld cohesion and chromosome conden- networks with weight matrices. In Pac. Symp. Btocomputing,
sation revealed through the analysis of MCD1 ms. cerev~siae. pp. 112-123, 1999.
Cell, 91(I).47-57, October 1997.
[39] X. Wen, S. Furhmann, G. S. Mlcheals, D. B. Cart, S. Smith,
[21] D. Heckerman, D. Geiger, and D M. Chlckenng Learning J. L. Barker, and R. Somogyl. Large-scale temporal gene
Bayesian networks: The combination of knowledge and sta- expression mapping of central nervous system development.
tistical data. Machine Learnmg, 20:197-243, 1995. Proc. Nat Acad. Sci. USA, 95:334-339, 1998.
[22] D Heckerman, C. Meek, and G. Cooper A Bayesian ap- [40] G. Yona, N. Lmial, and Lmlal M. Protomap - automated clas-
proach to causal discovery. Technical report, 1997. Technical slficauon of all protein sequences: a hierarchy of protein fam-
Report MSR-TR-97-05, Microsoft Research dies, and local maps of the protein space. Proteins: Structure,
[23] R. Hoffman and V. Tresp. Discovenng structure in continuous Function, and Genetics, 37:360-378, 1998.
variables using Bayesian networks. In Advances in Neural ln-
formatwn Processing Systems 8 (NIPS "96). MIT Press, 1996.
[24] V.R lyer, M B. Eisen, D.T. Ross, G. Schuler, T. Moore,
J.C.F. Lee, J.M. Trent, L.M. Staudt, J. Hudson, M.S. Boguski,
D Lashkan, D. Shalon, D. Botstein, and P.O. Brown. The
transcriptional program in the response of human fibroblasts
to serum. Sctence, 283:83-87, 1999.
[25] E V. Jensen An introduction to Bayesian Networks. Univer-
sity College London Press, London, 1996.
[26] D. Koller, U. Lerner, and D. Angelov. A general algorithm
for approxlamte inference and its application to hybrid Bayes
nets. In Proc. Fifthteenth Conference on Uncertainty in Arts-
fictal Intelhgence (UAI '99), pp. 324-333, 1999.
[27] D J. Lockhart, H. Dong, M. C. Byrne, M. T. Folletue, M V.
Gallo, M. S Chee, M. Mmmann, C. Want, M. KobayashJ,
H. Horton, and E. L. Brown. DNA expression momtonng by
hybridization of high density oligonucleotide arrays. Nature
Bwtechnology, 14:1675-1680, 1996
[28] W. G. McGregor DNA repmr, DNA replication, and UV mu-
tagenesls. J Investtg Dermatol Symp Proc, 4:1-5, 1999
[29] G S Mlchaels, D.B. Carl M. Askenazi, S. Fuhrman, X. Wen,
and R Somogyl. Cluster analysis and data vlsuahzatlon for
large scale gene expression data. In Pac. Syrup. Btocomputmg,
pp. 42-53 1998.

135

You might also like