Professional Documents
Culture Documents
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
Part 1
CS 838
www.cs.wisc.edu/~craven/cs838.html
Mark Craven
craven@biostat.wisc.edu
April 2001
1
Clustering Gene Expression Profiles
• there are many different clustering algorithms
• other clustering methods have been applied to gene
expression data
– EM with Gaussian clusters [Mjolsness et al. ‘99]
– self organizing maps [Tamayo et al. ‘99]
– graph-theoretic algorithms [Ben-Dor & Yakhini ‘98,
Hartuv et al. ’99, Sharan & Shamir ’00]
– etc.
• two general types of approaches
– hierarchical
– non-hierarchical
Hierarchical Clustering:
A Dendogram
height of bar indicates
degree of dissimilarity
within cluster
2
Scotch Whisky Dendogram
3
Similarity of Two Clusters
• the similarity of two clusters can be determined in
several ways
– single link: similarity of two most similar
members
– complete link: similarity of two least similar
members
– average link: average similarity between
members
4
The Data
• 79 measurements for yeast data
• collected at various time points during
– diauxic shift (shutting down genes for
metabolizing sugars, activating those for
metabolizing ethanol)
– mitotic cell division cycle
– sporulation
– temperature shock
– reducing shock
The Data
• each measurement Gi represents
red i
log
green i
where red is the test expression level, and green is
the reference level for gene G in the i th
experiment
• the expression profile of a gene is the vector of
measurements across all experiments
G1 ... Gn
5
The Data
• m genes measured in n experiments
g1,1 g1,n
g 2,1 g 2,n
g m ,1 g m,n
The Task
experiments
6
Gene Similarity Metric
• to determine the similarity of two genes X , Y
1 N
X i − X offset Yi − Yoffset
S ( X ,Y ) = ∑
ΦX
N i =1 Φ Y
N
(Gi − Goffset )2
ΦG = ∑ i =1 N
N
1 Xi Yi
S ( X ,Y ) = ∑
N i =1 N 2
X i N 2
Yi
∑ ∑
i =1 N i =1 N
7
Dendogram for Serum
Stimulation of Fibroblasts
8
Eisen et al. Results
• 126 genes down-regulated in response to stress
– 112 of the genes encode ribosomal and other
proteins related to translation
– agrees with previously known result that yeast
responds to favorable growth conditions by
increasing the production of ribosomes