Professional Documents
Culture Documents
Microarrays I: Introduction To The Concept & Background
Microarrays I: Introduction To The Concept & Background
Joe Assouline
BME
MicroArrays lectures
¾ MicroArrays I:
Introduction to the concept & Background
¾ MicroArray II:
Analysis, expression, image and normalization strategies
¾ MicroArray III:
Probe selection and techniques for utilization
Comparing
Subtractive
hybridization with
microarrays
publications in
cancer research
Pioneered by
Schena et al 1996
Brown, De Risi and
Trend et al 1996
http://derisilab.ucsf.edu
Traditional Subtractive Hybridization
methods
Subtractive Hybridization Method 2
Subtractive Hybridization Method I
SAGE:
Serial analysis of
Gene Expression
Serial sequences
determined the
absolute
abundance of every
transcript in the sample
The sequence of 15bp
at a time gene-specific
tags are produced and
concatenated and short
tags (9bp) isolated,
automatically
sequenced
SAGE
¾ Allow quantitative measurement of gene expression of large number
of transcripts
¾ A variety of SAGE Libraries available
z For each 9-bp tag there are 49 or 262,144 transcripts
z Tags are mapped to genes using Unigene
z Possible that 1tag>1 gene,# of tags in library proportional to #mRNA in
the biological sample
¾ www.ncbi.nlm.nih.gov/SAGE/
z Allows comparison of gene expression in various tissue for which SAGE
libraries have been generated
A) Co-expression plot for every pair of adjacent genes in a 70-gene regions of yeast (Chromosome 12
Correlation coefficient in expression profiles
10 minutes intervals over2 mitotic cycles green=pos; red=neg)
Several cluster co-expressed adjacent gene see diagonal (Cohen et al 2000)
B) Region of increased gene expression RIDGES on 3 human chromosomes see green bars
Expression levels were generated fro SAGES analysis of transcripts of 12 tissues
And correlated with gene density (black histograms)
Massively Parallel Signature
Sequencing (MPSS)
A) 4 types of fluorescent Oligo
are hybridized to 5’over-hang of
DNA Strand immobilized beads
-Most internal base=most
specific
-all possible combinations
of A, T, G, C are included
at other position
-Type II endonuclease
cleaves template proximal
to the previous cycle
exposing the next
nucleotide
B) Beads remain immobilized,
image of fluorescence is taken
at each of the positions
-successive images are
decoded for tag
frequencies
-Output readout : Relative
abundance of each
transcript
Generalities
¾ MicroArrays (MA) measure gene expression
¾ In contrast to EST sequencing projects and SAGE allow high-
throughput analysis of gene expression, MA is used to assess the
differential expression (mRNA abundance) of biological samples.
Advantages Disadvantages
FAST Cost
Comprehensive Unknown
Significance of RNA
Flexible Uncertain quality
control
Expression Analysis Flow Chart
MicroArray system
Using cDNA 2 dyes: Cy3 and Cy5
Pos / Neg = 18 / 2 = 9
Each of the three metrics are used to determine the Absolute Call via a decision
matrix to determine the status of each transcript (Present, Marginal, Absent).
Average Difference and Expression
Level
The Average Difference (Avg Diff) serves as a
relative indicator of the level of expression of a
transcript.
¾ is an estimation of the change in expression of a
given gene between two experiments.
¾ is calculated by taking the difference between
the PM and MM of every probe pair and
averaging the differences over the entire probe
set:
Decrease:
(1) (PM - MM)base - (PM - MM)exp > Change Threshold
(CT) And
(2) (2) [(PM - MM)base - (PM - MM)exp] / (PM - MM)base >
Percent Change Threshold / 100
Difference Call
The Difference Call Decision Matrix is an algorithm that
generates one of five outcomes for every transcript:
Increase (I), Marginally Increase (MI), Decrease (D),
Marginally Decrease (MD), and No Change (NC). The
following four metrics are weighted differently and
entered into the Decision Matrix:
To name a few:
General Most comprehensive
GeneSpring Clutering , data mining (heat mapping)
Datamining/management
Datamining/management
For collection, analysis and more of Genechip
Affimatrix Tools data. MAS, MBIE, RMA
Signal Extraction
Image Analysis
Gene filtering
Probe level analysis of oligo. Arrays
Normalization and removal of artifacts
(for comparisons across arrays)
Data Analysis
Selection of genes differentially
expressed (across exp. Conditions)
Clustering and classification of
biological samples
Clustering and …..of genes
Sample Pooling
z Can be used to dilute out
individual sample-to-
sample variation
z Combining samples may
yield enough to perform
the experiment
z Cannot pool if samples
treated differently
Replication
z Always - gives you the
ability to do statistics and
can aid in data mining
z Reduces the effects of
false-positives and -
negatives
z Costly
Errors and pitfalls
¾ Experimental design is critical
z Adequate # of exp. vs,. Control, use replicate (no magic #)
1 1
Control Treatment
1 1
2 2
¾ 4 Chips achieve 4 comparison data points
¾ Can begin to perform statistics for significance
¾ 50 % of false-positive and -negative values are
considered as genuine
3 Replicates
Control Treatment
1 1
2 2
3 3
Gene 1
I I D D 7 78
Gene 2
I I I I 9 100
Gene 3
I D
I D 3 33
Gene 4 D D D D 0 0
Gene 5 I I D 6 67
I
Gene 6
I I I I 9 100
I D D I 5 56
Gene 7
Using Replicates To Mine Data
reorganized
Replicate
Gene 2 I I I I 9 100
Gene 6 I I I I 9 100
Gene 1 I I D D 7 78
Gene 5 I I I D 6 67
Gene 7 D 5 56
I D I
Gene 3 I I D D 3 33
Gene 4 D D D D 0 0
Overview
of data
analysis
Assessment of the Gene Expression Profile of Differentiated and
Dedifferentiated Human Fetal Chondrocytes by Microarray
Analysis
David G. Stokes,1 Gang Liu,1 Ibsen B. Coimbra,1 Sonsoles Piera-Velazquez,1 Robert M. Crowl,2 and
Sergio A. Jime´nez1, ARTHRITIS & RHEUMATISM
Vol. 46, No. 2, February 2002, pp 404–419
Microarray
analysis
allows
clustering of
genes in
subclasses
Examples of MicroArrays Studies
Cancer
A) Hierarchical clustering
from solid tumor
samples
B) Clustering of diffuse B-
cell Lymphomas
C)Kaplan-Meier plots,
Display survival
probability based on
the relative level of
expression of
molecular subtypes
Key Features of Bioinformatics of Microarray
data analysis
Example GeneSpings software suite
¾ Data Normalization A comprehensive suite of
normalization options
¾ Data Clustering appropriate for different
technologies.
Many options are available which use color intensity to represent the expression level
Scripting
¾ Graphic representation of
genes and their
expression patterns based
on their location within a
cellular pathway.
¾ Interactive design of
pathway diagrams or
directly import
¾ predict genes associated
with discrete steps in the
pathway of interest.
Expression Profile Comparison or Probe Entire
Enterprise Repository for Conditions (PEER-C)
In our experiments, we are searching and processing microarray data from tissue normal and tumors
Goals are to compare genes profile from various sources to our sampling and analyzing strategies
Note: Our samples by in large are collected following enrichment under specific culture conditions
Advanced Statistical Tools
¾ significance (p-Value of
0.05)
Using GeneSpring software by Silicon Genetics for downstream analysis of the microarray data
genes with significant
differential expression
¾ claudin 5, a protein whose function is critical to the maintenance of
the blood brain barrier (BBB). Our data also demonstrated differences
in gene expression of connexin 37, a gap junction protein.
¾ Members of the growth factor and their receptor families (i.e. tumor
necrosis factor and its receptor, transforming growth factor beta receptor II)
10
0.1
Sampl e
0.01
N B NB N EndoNENdo
DJM DJMDJP DJP
10
1
0.1
Sampl e
0.01
NB N Endo DJM DJP
Venn Diagram
Assessment of Normal Brain 0
551
0
v.s Tumor
463 0
0
0
NBvsNEndovs tumor
Comments URL
AMAD
From Stanford and the University of California ~http://www.microarrays.org/sofrware.html
ArrayExpress at Berkeley and at San Francisco
From Alvis Brazma and colleagues at the EBI From the ~ http://www.ebi.ac.uk/arrayexpress/
ChipDB ~ http://young39.wi.mit.edu/chipdb_public/ ~
ExpressDB Whitehead Institute
At Harvard; relational database containing yeast http://arep.med.harvard.edu/ExpressDB/
RNA expression data
Gene Director From Biodiscovery ~ http://www.biodiscovery.com
GeNet From Silicon Genetics ~ http://www.sigenetics.com
GeneX ~ http://genex.ncgr.org/‘
From NCGR
~ http://www.ncbi.nlm.nih.gov/geo/
GEO GXD Gene Expression Omnibus from NCBI From the Jackson
~ http://www.informatics.jax.org/
MAdb Laboratory National Cancer Institute
~ http://madb.nci.nih.gov
MaxdSQL University of Manchester ~ http://www.bioinf.man.ac. uk/microarray /maxd
RAD University of Pennsylvania http://www.cbil.upenn.edu/radZ/ servlet
Stanford Microarray Stanford University ~ http://www.dnachip.org/
Database
Demos
¾ Repositories
http://www.microarrays.org/sofrware.html
¾ AMAD From Stanford
and the University http://genome-www5.stanford.edu/cgi-
of California at bin/search/QuerySetup.pl
Berkeley and at
San Francisco http://genome-
www5.stanford.edu/index.shtml
Stanford 43K human cDNA microarray (www.microarray.org/sfgf)
TIGR
http://www.tigr.org/software/
http://nciarray.nci.nih.gov/
National cancer institute