Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Journal of Pharmacological and Toxicological Methods 53 (2006) 31 – 37

www.elsevier.com/locate/jpharmtox

Appraisal of state-of-the-art

Toxicogenomic analysis methods for predictive toxicology


Jeff Maggioli *, Aubree Hoover, Lee Weng
Rosetta Biosoftware, 401 Terry Avenue, North Seattle, WA 98109, U.S.A.

Received 20 May 2005; accepted 23 May 2005

Abstract

Toxicogenomics, the application of genomic data to elucidate or predict an organism’s response to a toxicant, can inform the drug
development process in important ways. It is apparent that standardized approaches to many types of toxicogenomic questions are still being
formulated. Specifically, a significant body of proof of principle studies has emerged that demonstrates a range of statistical methodologies
applied to predictive toxicology. These studies rely on class prediction methods – mathematical models generated using the gene expression
profiles of known toxins from representative toxicological classes – to predict the toxicological effect of a compound based on the
similarities between its gene expression profile and the profiles of a given toxicological class. Class prediction methods hold promise for
increasing the rate at which compounds can be evaluated for toxicity early in the drug discovery process, while at the same time reducing the
length of toxicological studies and their associated costs. Class prediction methods are informed by class comparison and class discovery
steps, which inform, respectively, the selection of genes whose response can be used to distinguish among the toxicological classes and the
number of classes distinguishable using the response of these genes. Together these steps use a variety of complementary statistical
techniques to achieve a successful class prediction model. This report attempts to review some of the themes that appear to be emerging in the
application of these techniques to predictive toxicology methods over toxicogenomics’ short history.
D 2005 Elsevier Inc. All rights reserved.

Keywords: Predictive toxicology; Toxicogenomics; Gene expression; Review

1. Introduction used to predict the toxicological class of an unknown


toxicant (e.g., a new chemical entity or drug candidate). If
As a provider of commercial software for gene-expression robust class prediction methods can be routinely generated,
data analysis and management, we work closely with the drug development process will benefit, as significant
pharmaceutical companies, biotechnology organizations, gains are expected in both the speed of analyzing candidate
and academic institutions. Through our interaction with compounds and in reducing development costs, as many
these entities, we have observed a clear trend toward the use compounds could be eliminated before undergoing tradi-
of expression profiling in toxicogenomic endeavors. When tional toxicological studies.
applied to mechanistic toxicology studies, gene expression Gene expression data can provide an early indication of
data can be mined to find which genes out of hundreds or toxicity because toxin-mediated changes in gene expression
thousands monitored are perturbed by a treatment, providing are often detectable before clinical chemistry, histopathology,
important clues about a toxin’s underlying mechanism of or clinical observations suggest a toxic effect (Ulrich &
toxicity (Afshari, Nuwaysir, & Barrett, 1999). A growing Friend, 2001). However, to fulfill the promise of accelerating
number of studies demonstrate that gene expression data are preclinical evaluation of drug candidates, many hurdles
also useful for class prediction studies (predictive toxicol- remain, including the creation of databases containing
ogy), in which expression signatures from known toxins are relevant gene expression data from studies of known toxins,
division of known toxins into toxicant classes distinguishable
* Corresponding author. using expression data, understanding the time and dose-
E-mail address: myra_ozeta@rosettabio.com (J. Maggioli). dependency of gene response, and correlation of gene
1056-8719/$ - see front matter D 2005 Elsevier Inc. All rights reserved.
doi:10.1016/j.vascn.2005.05.006
32 J. Maggioli et al. / Journal of Pharmacological and Toxicological Methods 53 (2006) 31 – 37

response to phenotypes (Pognan, 2004). Class prediction For example, in the class comparison step, statistical
methods only indicate possible relationships between gene significance tests like ANOVA may be applied to select
response and phenotypes. Further study is necessary to genes that vary among the toxicological classes being
distinguish causative from reactive genes. studied. Clustering techniques, considered unsupervised
In addition, the information-rich data set and the dynamic learning methods in that they only consider gene expres-
nature of gene expression present computational challenges sion data and not toxicological class information when
for the routine use of class prediction methods in drug representing similarities between treatments, are usually
development. Genomic data sets are a complex matrix, often applied in the class discovery step. Literature examples
containing thousands of individual data points. Genes useful exist that describe the class comparison, class discovery,
for class prediction are selected for inclusion in the and class prediction steps for the classification of known
discriminatory gene set, whose expression values (the gene hepatotoxins as either peroxisome proliferators or enzyme
signature) can be used to distinguish among the toxico- inducers (Hamadeh, Bushel, Jayadev, DiSorbo et al.,
logical classes studied. This discriminatory gene set must be 2002), known toxins to one of five characterized toxico-
discovered against a complex background of other gene logical classes (Thomas et al., 2001), and toxic metals into
expression changes, some resulting from factors unrelated to seven or nine distinct groups (Tsai et al., 2005).
the treatment (e.g., sampling time) and others a result of the The published accounts of class prediction methods,
treatment, but not useful for class prediction, as they are including the related steps of class comparison and class
perturbed in a similar way by diverse toxins (e.g., genes discovery, show much diversity in the statistical methods
involved in metabolic pathways) (Hamadeh, Bushel, Jaya- applied, with some general themes repeated in many, but not
dev, Matin et al., 2002). all, studies (e.g., the use of unsupervised learning methods
Gene expression experiments also present a challenge to for class discovery and supervised learning methods for
traditional statistical significance testing because signifi- class prediction). This review attempts to survey the ways
cant change must be calculated for datasets with many different groups are approaching the computational chal-
variables (potentially tens of thousands) but few available lenges posed by the use of gene expression data for
experimental replicates. Finally, toxins may affect gene predictive toxicology and provide a discussion of possible
expression in complex ways, requiring statistical methods future directions for class prediction methods.
that can consider the interaction of expression changes,
i.e., genes excluded by statistical significance tests like
ANOVA (because they do not change significantly across 2. The process of predictive toxicogenomics
groups) may still have predictive value when coupled to
the response of genes that do change. At a high level, the process leading up to a successful
The most successful computational methods for class class prediction model can be represented as three to five
prediction are the supervised learning methods (or classi- steps (see Fig. 1).
fiers). These methods rely on a training set consisting of
gene expression profiles from representatives of the Data Preparation— Datasets are corrected for sources of
different toxicological classes to be modeled. The gene variability that result from causes
signatures from samples in the training set and the other than the treatments under study
knowledge of their origins (toxicological class) are used (e.g., hybridization differences in
to derive a set of algorithms that can be used to classify preparation of the microarrays, var-
unknowns. Class prediction methods most often follow iable recoveries of mRNA, fluores-
class comparison and class discovery steps, which, cent dye labeling efficiencies).
respectively, inform the selection of the discriminatory Class Comparison— The prepared data from a training set
gene set and help to define the toxicological classes are analyzed to define the discrim-
distinguishable by gene expression signatures. These steps inatory gene set—the set of genes
often make use of complimentary statistical techniques. that allow for differentiation among

Class
Prediction

Data Class Evaluation


Preparation Comparison

Class
Discovery

Fig. 1. An abstract view of the relationship of class prediction to the related steps that inform it.
J. Maggioli et al. / Journal of Pharmacological and Toxicological Methods 53 (2006) 31 – 37 33

the toxicological classes represented Current literature examples include a wide variety of
in the training set. techniques used for class prediction and the related steps
Class Discovery— The similarity among treatments shown in Fig. 1. Though it is convenient to break
present in the training set is visual- toxicogenomic studies down into discrete steps, mapping
ized using techniques like clustering, statistical methods to these steps can be a challenge. Multiple
an unsupervised learning method statistical methods are often evaluated at each step in the
that groups treatments based only process of generating a class prediction model. Furthermore,
on the similarities in their gene techniques commonly associated with one step may be
signatures and does not employ applied in another. For example, supervised learning meth-
knowledge about the samples’ tox- ods, typically associated with the class prediction step, may
icological class. be applied as a class comparison technique, to identify the
Class Prediction— Unknown or blinded samples are discriminatory gene set (Hamadeh, Bushel, Jayadev, DiSorbo
assigned to toxicological classes. et al., 2002). Finally, because supervised learning methods do
Typically, a classifier (supervised not lend themselves well to visualization, a number of other
learning method) is applied to the statistical techniques may be used to provide qualitative
gene signatures of the training set views of the similarities between treatment groups. The flow
samples to generate a mathematical diagram shown in Fig. 2 attempts to capture the diversity of
model for predicting the toxicolog- approaches used to generate class prediction methods.
ical class of unknowns.
Evaluation— The model generated for class pre- 2.1. Data preparation
diction is evaluated. Blinded samples
can be used to estimate success rates All microarray experiments are affected by systematic
in predicting the toxicological class and random error. Random error can be generated by factors
of unknowns, or individual samples such as background noise, scanner noise, and hybridization
from the training set can be used to noise. The ideal way to reduce random error is to generate
evaluate the model, using a ‘‘leave many replicates and perform data analysis on the combined
one out’’ validation approach. replicates. When replicates are limiting, as they often are in

Microarray Data Data Preparation

Candidate
Normalized
Hypothesis Testing Discriminatory
data
Gene Sets

Learning Methods

Complimentary Unsupervised Supervised


Data Sets

Data-Driven Discovery
Clinical
Histopathology
Proteomic
Metabolomic
Classification Evaluation or
Model Validation

Fig. 2. An information-centric view of the class prediction process and the steps that inform it. Families of techniques are represented by the blue boxes (e.g.,
Hypothesis Testing includes parametric methods like t-tests, ANOVA, and non-parametric methods like Wilcoxon and SAM). In any one study, multiple
techniques from the same family are often applied for comparison. The evaluation step informs the success of each technique. Selection of statistical methods
and discriminatory gene sets is often refined in an iterative process to generate a final classification model for unknowns or blinded samples.
34 J. Maggioli et al. / Journal of Pharmacological and Toxicological Methods 53 (2006) 31 – 37

microarray studies, an estimation of random errors can be visually—more similar gene expression profiles are grouped
useful. Systematic errors have known causes or well together. Clustering is a valuable exploratory technique for
understood behaviors, and can be corrected. Examples of helping to characterize how many classes a given set of
microarray systematic error include scanner sensitivity or treatments can be divided into.
non-zero background intensities. Preprocessing algorithms Two common types of clustering algorithms are hier-
such as background subtraction, normalization, and de- archical and partitioning algorithms. Hierarchical algorithms
trending can reduce or eliminate systematic error. An in yield a hierarchy of clusters for a data set that can be
depth discussion of data preprocessing and normalization visualized in a dendrogram tree (see Fig. 3). Data sets
methods is beyond the scope of this paper, but can be found belonging to the same branch of a cluster are similar to each
in a number of references (e.g., Baldi & Hatfield, 2002). other at some level, whereas data sets in separate branches
are less similar.
2.2. Class comparison Partitioning algorithms like K-Means divide the data set
into an a priori specified number of clusters that are viewed
A common approach to class comparison is to search for in tabular format to make inferences about their similarity
a discriminatory gene set among expression profiles within a cluster. Because partitioning algorithms result in
generated from studies of toxins representative of known bins, unique inferences about the relationship of each data
toxicological classes. Statistical significance testing is often point in a cluster to every other data point in the cluster are
used to select the discriminatory gene sets. For example, to not apparent. Similarly, further inferences about the relation-
select discriminatory genes from a training set of expression ship of all the data points in one cluster to all the data points
profiles from rats exposed to nine toxic metals, an ANOVA in another cluster cannot be drawn (see Fig. 4).
F-test was used to find those genes that varied significantly The results of clustering are influenced by the type of
across the nine treatment groups and an OVA (one-versus- similarity measure used to calculate the distance between
all) test identified gene expression that varied significantly items in the clusters. Distance based similarity measures,
when each group was compared to the average of the other like Euclidian distance, emphasize the magnitude of the fold
eight (Tsai et al., 2005). The resulting two discriminatory changes between data sets. Correlation-based measures,
gene sets (the set defined from the F-test and the union of such as combination with mean subtraction, emphasize the
the nine groups returned from the OVA analysis), as well as pattern of the fold changes.
a third set, consisting of those genes appearing in both Cluster analysis was used to study how gene response
original sets, were then evaluated for their ability to classify varied over time for a given treatment (Hamadeh, Bushel,
toxic metals successfully. Jayadev, Matin et al., 2002). The results of time course
Another approach is to reduce the dimensionality of the analysis can help further refine the genes included in the
complex data set using dimension-reducing techniques such discriminatory gene set, as one of the common goals for a
as principal component analysis (PCA), multidimensional class prediction method is a time-independent model, a
scaling (MDS), or wavelet transformation (Yang, Blomme, model that excludes genes whose response is highly
& Waring, 2004). Rather than requiring the selection of unstable with time. In the same study, clustering, PCA,
specific genes from a data set, these techniques reduce the and correlation analysis, were all used to demonstrate the
high-dimensionality of the original data set, which can similarities between test compounds from the two classes
include thousands of variables, into a smaller number of (peroxisome proliferators and enzyme inducers) and provide
weighted variables. One disadvantage of this approach is that preliminary evidence that creation of a model to distinguish
information about which genes are modified most for these classes should be attainable.
individual classes is obscured (Tsai et al., 2005). Hierarchical clustering algorithms using a distance
A combination approach is sometimes used. To identify a (Euclidian distance) or correlation-based (one minus corre-
discriminatory set for classifying hepatotoxins, ANOVA lation coefficient) similarity measures were compared for
analysis was used to identify the top 200 genes that varied their ability to cluster datasets from nine rats treated with
among the groups studied. Wavelet transformation was then toxic metals (Tsai et al., 2005). Though clustering was an
applied to the expression profiles from these 200 genes, investigative step to examine the number of classes repre-
reducing their response into seven components (Yang et al., sented by the nine treatments, the clusters were compared by
2004). These seven components were carried forward to the examining their ability to group replicate treatments together
class prediction stage. into eight groups in a class prediction type exercise (group-
ings were defined by the study authors). Though clustering
2.3. Class discovery methods are useful for investigating similarities between
treatments (class discovery), clustering is not recommended
Clustering methods are often applied to visualize the for use in class prediction. Clustering is a subjective
similarities between individual treatments as well as multi- technique, whose results are highly influenced by selection
ple treatments from different toxicological classes. Hier- of the clustering algorithm and similarity metric (Simon,
archical clustering represents the distance between samples Radmacher, Dobbin, & McShane, 2003).
J. Maggioli et al. / Journal of Pharmacological and Toxicological Methods 53 (2006) 31 – 37 35

Fig. 3. An example of two types of hierarchical clustering algorithms applied to the data sets derived from rats treated with 15 known hepatotoxins (taken from
Waring et al., 2001). The 2D clusters were generated using the Rosetta Resolver\ System. Reproduced with permission.
36 J. Maggioli et al. / Journal of Pharmacological and Toxicological Methods 53 (2006) 31 – 37

pairwise Pearson correlation coefficient between each sample


in the training set and each blinded sample. Samples were
considered similar if their correlation coefficient was > 0.8.
Tsai et al. compared Fisher’s Linear Discriminant
Analysis (FLDA) and kNN approaches for predicting the
class of gene signatures from the liver tissue of rats exposed
to various toxic metals (Tsai et al., 2005). A leave-one-out
validation approach was used to evaluate the methods. In
this approach, the classifier is calculated using data from all
but one sample. The resulting algorithm is then used to
classify the sample left out. This process is repeated until all
samples have been classified. A leave-one-out validation
approach was also taken by Thomas et al (Thomas et al.,
2001) who applied a Naı̈ve Bayesian classifier to assign
samples to five distinct toxicological classes.
Fig. 4. Representation of a K-means cluster. Especially for classifiers built from small training sets,
class prediction of a larger set of independent samples may
2.4. Class prediction and evaluation be necessary to characterize the method’s true performance
(Simon et al., 2003). A known weakness of classifiers is the
Class prediction typically relies on supervised learning tendency to ‘‘overfit’’ the data in the training set, which
methods (classifiers) to assign a toxicant to a known group. limits the utility of the model for predicting profiles outside
The methods use a discriminatory gene set (or a data set that of the training set. With this view, the leave-one-out
has been reduced using a dimension-reducing technique) validation approach is a reasonable first step in character-
derived from a training set, to obtain a mathematical model izing the performance of a method. But to accurately
that can predict the class membership of unknowns. One characterize the method’s ability to predict the class of
active area of study is the practice of filtering out invariant unknowns, a validation would need to include samples
gene expression signals versus applying classifiers to the independent from the training set, representing each
full data set. A recent study compares the effects of two toxicological class recognized by the model (Simon et al.,
different types of data filtering on the performance of four 2003).
classifiers for distinguishing genotoxic from non-genotoxic
compounds (Van Delft et al., 2005). There are a variety of
classification methods available, including Linear Discrim- 3. The future of predictive toxicogenomics
inant Analysis (LDA), nearest-neighbor (NN) methods,
Naı̈ve Bayesian classifiers, as well as machine learning With further study, the computational methods used in
methods, such as bagging methods, support vector machines class comparison, class discovery, class prediction, and
(SVM), and artificial neural networks (ANN). Dudoit et al. evaluation will likely become more standardized. For
compared many of these methods for their ability to classify example, the choice of supervised learning methods for
tumors and found that for their test data set, LDA and NN specific applications is likely to be narrowed by ongoing
methods yielded the best prediction accuracies (Dudoit, research, in which the theoretical merits and performance of
Fridlyand, & Speed, 2002). various classifiers are compared. The oncology literature
In an approach that combines class comparison and class includes a comparison of linear discriminant analysis,
prediction, Hamadeh et al. used both kNN and LDA classification tree, and nearest neighbor methods (Dudoit
classifiers to select genes most useful for distinguishing et al., 2002). The methods were compared for their ability to
between the two classes of compounds being studied: successfully predict tumor class using the gene expression
peroxisome proliferators and enzyme inducers (Hamadeh, data from a number of published studies. A more recent
Bushel, Jayadev, DiSorbo et al., 2002). The K-nearest study has appeared in the toxicology literature, comparing
neighbors (applied with a Genetic Algorithm used for the performance of four supervised learning methods on
searching) yielded a ranked list of genes useful for their ability to distinguish genotoxic from non-genotoxic
distinguishing between the two compound classes. LDA carcinogens (Van Delft et al., 2005). In this study, the
(applied after an initial ANOVA analysis to exclude genes evaluation of the classifier methods was combined with an
whose expression did not change significantly across evaluation of different input data sets (different methods for
compound classes) yielded a second set of genes that class comparison). An analysis of methods of error rate
appeared to discriminate between compound classes. The reporting for classification methods has appeared, stressing
22 genes appearing in the intersection of these two sets were the importance of a significantly large and diverse inde-
used to classify blinded samples. Classification was accom- pendent validation dataset for sufficient characterization of
plished using correlation set analysis, by calculating the class prediction methods (Simon et al., 2003).
J. Maggioli et al. / Journal of Pharmacological and Toxicological Methods 53 (2006) 31 – 37 37

A number of other publications have begun to detail the Hamadeh, H. K., Bushel, P. R., Jayadev, S., Martin, K., DiSorbo, O., &
many obstacles that remain before class prediction methods Sieber, S., et al. (2002). Gene expression analysis reveals chemical-
specific profiles. Toxicological Sciences, 67, 219 – 231.
can begin to fulfill the promises of accelerating the drug Hayes, K. R., Vollrath, A. L., Zastrow, G. M., McMillian, B. J., Craven, M.,
development process, or possibly even replacing some & Jovanovich, S., et al. (2005). EDGE: A centralized resource for the
traditional toxicological studies. Among these challenges comparison, analysis, and distribution of toxicogenomic information.
is the cost-intensive process of building relevant databases Molecular Pharmacology, 67, 1360 – 1368.
Lühe., A. Suter, L. Ruepp, S., Singer, T., Weiser, T., Albertini, S. (2005).
of gene expression profiles of known toxins (Lühe et al.,
Toxicogenomics in the pharmaceutical industry: Hollow promises or
2005; Van Delft et al., 2005), the difficulties of comparing real benefit? Mutation Research, 575(1 – 2), 102 – 115.
gene expression data collected using different technologies Pognan, F. (2004). Genomics, proteomics and metabonomics in toxicology:
(Hayes et al., 2005), and the challenge of making useful Hopefully not Ffashionomics_. Pharmacogenomics, 5(7), 879 – 893.
predictions of toxicity across species or from in vitro Simon, R., Radmacher, M. D., Dobbin, K., & McShane, L. M. (2003).
systems (e.g., cultured primary hepatocytes) to living organs Pitfalls in the use of DNA microarray data for diagnostic and
prognostic classification. Journal of the National Cancer Institute, 95,
in human beings (Pognan, 2004). 14 – 18.
Though many obstacles remain, work continues to try and Thomas, R. S., Rank, D. R., Penn, S. G., Zastrow, G. M., Hayes, K. R., &
make class prediction methods robust and sufficiently Pande, K., et al. (2001). Identification of toxicologically predictive
relevant for routine use in toxicological evaluation of novel gene sets using cDNA microarrays. Molecular Pharmacology, 60,
compounds. A continual refinement in the application and 1189 – 1194.
Tsai, C. A., Lee, T. C., Ho, I. C., Yang, U. C., Chen, C. H., & Chen, J. J.
evaluation of computational approaches will undoubtedly (2005). Multi-class clustering and prediction in the analysis of micro-
continue to be an important part of this effort. array data. Mathematical Biosciences, 193, 79 – 100.
Ulrich, R., & Friend, S. (2001). Toxicogenomics and drug discovery: Will
new technologies help us produce better drugs? Nature Reviews Drug
References Discovery, 1, 84 – 88.
Van Delft, J. H. M., van Agen, E., van Breda, S. G. J., Herwijnen, M. H.,
Staal, Y. C. M., Kleinjans, J. C. S. (2005). Comparison of supervised
Afshari, C. A., Nuwaysir, E. F., & Barrett, J. C. (1999). Application of
clustering methods to discriminate genotoxic from non-genotoxic
complementary DNA microarray technology to carcinogen identifica- carcinogens by gene expression profiling. Mutation Research, 575,
tion, toxicology, and drug safety evaluation. Cancer Research, 59, 1 – 3.
4759 – 4760. Waring, J. F., Jolly, R. A., Ciurlionis, R., Lum, P. Y., Praestgaard, J. T., &
Baldi, P., & Hatfield, W. (2002). DNA microarrays and gene expression.
Morfitt, D. C., et al. (2001). Clustering of hepatotoxins based on
Cambridge, UK’ Cambridge University Press. mechanism of toxicity using gene expression profiles. Toxicology and
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). Comparison of discrim- Applied Pharmacology, 175, 28 – 42.
ination methods for the classification of tumors using gene expression Yang, Y., Blomme, E. A., & Waring, J. F. (2004). Toxicogenomics in drug
data. Journal of the American Statistical Association, 97, 77 – 87.
discovery: From preclinical studies to clinical trials. Chemico-Bio-
Hamadeh, H. K., Bushel, P. R., Jayadev, S., DiSorbo, O., Bennett, L., & Li, logical Interactions, 150, 71 – 85.
L., et al. (2002). Prediction of compound signature using high density
gene expression profiling. Toxicological Sciences, 67, 232 – 240.

You might also like