Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Final not cummulative

DNA microarrays:
“procedure, fabrication, data processing and analysis”

CMMB 461
University of Calgary
Gordon Chua 1
Suggested Readings
1. C.A. Harrington et al. (2000) Monitoring gene expression using DNA
microarrays, Curr. Opin. Microbiol. 3, 285–291.

2. Hughes, T.R. and Shoemaker, D.D. (2001) DNA microarrays for


expression profiling; Curr Opin Chem Biol. 5, 21-5.

3. Quackenbush, J. (2002) Microarray data normalization and


transformation. Nat Genet. 32, Suppl:496-501.

4. Leung and Cavalieri (2003) Fundamentals of cDNA microarray data


analysis. Trends in Genetics 19, 649-659

5. M.B. Eisen et al. (1998) Cluster analysis and display of genome-wide


expression patterns. PNAS 95,14863-8..

2
1. Background and Introduction to
Microarrays:
“What are they and what led to its development?”

3
Transcriptome
•Transcriptome: defined as a complete set of transcripts
encoded in the genome and their relative levels of expression in
a particular cell or tissue type under defined conditions.
- every single RNA in the cell

•Characterizing the transcriptome can identify:

•Genes exhibiting cell and tissue-specific expression

•Genes aberrantly expressed in cell and tissue disease


(molecular basis behind the disease)

•Genes expressed in response to environmental toxins and


pharmaceutical compounds (mode of action and side effects)

•Genes expressed in response to pathogens (mode of infection


how pathogens affect the host? by looking at the gene can be identify by transcriptome
and virulence)

4
• Obtain blood transcriptomes of
104 ASD cases and 82 controls
(all males) - extracted the RNA in affected and control

• Found 55 genes differentially


regulated as candidates to
diagnose autism

• 68% accuracy for ASD


identification with these 55
genes for males, poorly for
females

• Blood test to detect autism may


be possible.
Kong et al. 2012 PLoS ONE 7: e49475 5
Northern blotting
•Conventional method to detect RNA transcripts of a cell and tissue.

•To characterize the transcriptome of a human cell or tissue type,


you would have to run 25,000 northerns and use 25,000 different
probes!
the probe will only bind to rna of interest

Wikipedia 6
DNA microarrays
•DNA/gene chip that contains single-stranded probes (25-70 nucleotides)
with sequence complementary to a specific gene/mRNA

•Each probe is present in many copies in a spot on the microarray

•Fluorescent-labelled mRNAs or cDNAs are placed on the microarray to


hybridize (complementary base pairing) to the probes

•The intensity of the fluorescence is proportional to the abundance of


mRNA/cDNA that bind to the probe.

•Allows the simultaneous monitoring of the expression (mRNA) level of


every gene in an organism in response to genetic and environmental
perturbation) - monitor the expression of every gene

•In a single experiment, (two weeks) can determine which genes in the
genome are transcriptionally turned on or off

7
Microarray probe design single stranded and must be very specific

. . . . . .
. . . . . .
. . . . . .
A T G T C C
T A C C C A
Side view C G C A T A Top view
A G T A T G
G C A C A C
C A A A G T

1. Specificity: unique for each gene, no cross hybridization

2. Homogeneity: bind to complementary DNA at same Tm

3. Sensitivity: not form 2o structures that interfere with


hybridizations

8
competitive hybridization
Microarray procedure
Wild type Mutant/drug
(Control) (Experimental)
Isolate total mRNA
X X
X X
X
X
X Y
X Y
Y
Y Y
Z Y Z Y
Z Y
Z Reverse transcribe and label
cDNA with red (Cy5) and
X
X X green (Cy3) fluorescent dyes
X X
X
X Y
X Y
Y
Y Y
Z Y Z Y
Z Y
Z Relative levels
for X there is more in Wild-typw so shows as red
X XX equal amount = yellow
XX grey spot= gene is not expressed

X X Y Z UP
X
X Y
Y DOWN
Y
Y Y
UNCHANGED
ZZZ
YY
Y NOT PRESENT
Z
9
2. Fabrication of Microarrays:
“How do they get oligonucleotides probes on a
matrix at such high densities?”

10
Ink-jet microarrays (Agilent)
Ink-jet print-head uniformly deposits small, accurate
volumes (picoliters) of nucleic acids building the 60-mer
oligonucleotide probes one base at a time onto a 1’ X 3’
glass slide

•Flexible, customizable
•All 60-mer probes are virtually functional
4 X 44K Expression •No need for expensive masks: cheaper
microarray •Density: >1,000,000 spots/array

http://www.agilent.com/about/newsroom/lsca/background/2007/bg_microarrays.pdf 11
Photolithographic microarrays (Affymetrix)
•Oligonucleotide probe synthesis
on wafer using combination of
photolithography and chemistry
they use light and chemistry, they add blocking agent

•Photomask: opaque plate with


holes that allow light to shine in
specific locations on the silicon
wafer

•Light removes blocking


compound which prevents base
addition to wafer

•Flood with a chemical base (e.g.


adenine) which attaches to
unblocked area of wafer

•Repeat this process with


blocking compound and new
http://www.affymetrix.com/estore/about_affymetrix/outreach/educ
ator/microarray_img_resources.affx#
photomask. 12
Affymetrix microarray protocol used mainly for human samples, as they couldn't get enough RNA

Involves amplification by transcription of


cDNA into cRNA (requires less starting
material)

Probes consists of mis-match and perfect


match types
probes are shorter= one of the probe is the perfect complementary sequence

Sample is hybridized at a specific


temperature where sample will bind to PM
and not MM probes

High hybridization ratio of PM to MM


indicates expression of a gene

25 bp probes are less specific and require


multiple probes/gene to compensate.

The use of masks in making the


microarray contribute to the higher cost
13
3. Labelling, Scanning and Image
Processing

“Getting colourful microarray images and


extracting the data”

14
Common ways to “label” nucleic acids
Random priming of double- Direct labeling of mRNA with
stranded DNA fluorescent molecules:

Reaction * *
contains *
labelled Amplification by transcription
nucleotides AAAAAAAA
*
AAAAAAAA
Poly-T primed cDNA synthesis
TTTTTTTTTT-T7 promoter
(Reverse transcription)
AAAAAAAA “second AAAAAAAA-T7 promoter
poly-A tsil so get probe with
ploy-T primed strand”
synthesis TTTTTTTTTT-T7 promoter

Reaction
AAAAAAAA T7 reaction
contains
labelled * * contains
** TTTTTTTTTT
* * labelled
nucleotides
* nucleotides
Courtesy of Tim Hughes 15
Fluorescence dyes for labelling microarray samples (Cy3 and Cy5)

•Fluorescence: emission of light by a molecule


that has absorbed light/radiation (excitation)

•Water-soluble fluorescent dyes of the cyanine


Excitation wavelengths family

•Cy5 dye is excited with a 635 nm red laser


and detected by a emission filter that passes
only 650-690 nm light

•Cy3 dye is excited with a 532 nm green laser


Emission wavelengths and detected by a emission filter that passes
only 557-592 nm light

•Fluorescent intensity is detected by a


photomultiplier tube
http://www.answers.com/topic/cy3-cy5-dyes-gif-1
http://www.jireurope.com/technical/images/GRAPH1.gif 16
How does microarray data initially look like?
•For each microarray, acquire two TIFF images (16-bit)
scanned with either the Cy5 (red) and Cy3 (green) channel

Cy5 channel

Cy3 channel

Red and Green Laser Scanner (Genepix) Merged


- measure the intensity of how red or green it is, which can tell how
muc it is expressed
https://www.youtube.com/watch?v=yzBVOCwRanI 17
Image processing: preparation of TIFF images for
raw data extraction

1. Array localization: find the spots (align the


grid)

2. Spot quality assessment: flag bad spots


(artifacts, bleeding)

3. Image segmentation: categorize each pixel


as signal, background or other

4. Quantification: assign signal and background


values to each spot
18
Image processing: array localization and spot
quality assessment

Grid construction by defining number and size of


spots in columns and rows and spacing in between

http://transcriptome.ens.fr/sgdb/tools/images
19
Image Segmentation
Spatial segmentation •Partition the image to determine
which pixels constitute signal or
background

•Use an inner circle to calculate


signal value and pixels outside the
outer circle as local background

•Problem: sometimes inner circle is


not small enough for tiny spots

•Intensity-based segmentation: rank intensity of pixels and take a


cut-off equivalent to the approximate area of the spot = signal

•Can use a combination of the two types of segmentation

•Background correction also can be blank spots or control spots


of exogenous DNA
20
Quantification of signal and background
•Mean, median, mode and total intensity of segmented signal
(microarray spots) and background pixels are determined in a text
file (e.g. gpr file)

•Signal intensity =total spot intensity-background intensity

•Median is usually used because it is more robust to outliers


Cy5

Spot
Cy3
location on
microarray

Genes

21
4. Microarray Data Pre-processing and
Normalization
“Correct the data first before spending all your
time analyzing it”

22
Log transformation of expression ratios
•When comparing relative abundance of gene expression between two
samples, take the ratio of Cy5/Cy3 values (R/G)

•Log the expression ratios (log(R/G)): increases symmetrical distribution


of data (upregulated and downregulated genes are treated equally)

•2-fold change (R/G) = 2 or 0.5 while log2(R/G) = +1 or -1

log (R/G)
R/G

http://www.bio.davidson.edu/people/macampbell/ACS_MAGIC/transform.html 23
Microarray Data Normalization

Required to correct for variations caused by:


•Unequal amounts of cDNA

•Distinct dye properties (fluorescence/quenching)

•Differences in dye incorporation

•Differences in scanning

24
Within array/single experiment normalization
Cy3-control Cy5-experimental

Log intensity-Cy3
Log intensity-Cy5

•Assumption: most genes are not differentially regulated

•Graph looks like the vast majority of genes (spots) are up-
regulated in the experiment

•More likely that more labelled-Cy5 cDNA used in the hybridization


or non-linear dye properties 25
Global linear normalization
•Assume equal quantities of cDNA and total intensity of Cy3 and Cy5

•Normalization constant=Σ(Cy3)/Σ(Cy5) [e.g. 10,000/20,000 = 0.5]

•For each gene, multiply the Cy5 intensity by the normalization


constant=0.5 (make ratios =1)

•Only works partially because the relationship is not linear.


Log intensity-Cy3

Log intensity-Cy3

Log intensity-Cy5 Log intensity-Cy5


26
Scatterplot versus M/A (R/I) plot

M=log2(R/G)
log2(G)

log2(R) A=(½ )*log2(R*G)

•A= intensity (brightness) of microarray spots


•M=log expression ratio

•M/A plot allows for detection of intensity-dependent effects


on log expression ratios.

•Plots above shows that most of the greener spots are low
intensity spots
http://compbio.pbworks.com/w/page/16252907/Microarray%20Normalization%20and%20Gene%20Expression%20Index 27
Global Lowess (locally weighted linear regression)
•Performs a series of local regressions in overlapping windows with a
weighted average of neighbouring spots (curve fitting and correction)

•Each regression is combined to make the Lowess smooth curve (weighted


average values: closer spots have greater weight that far-away spots

Window

Fitted line is a function of


mean intensity

28
Global Lowess (locally weighted linear regression)
•Normalized log (R/G)=log(R/G)-Lowess correction

•Lowess correction: subtracting the deviation/distance of the Lowess curve


from the zero axis from the log ratios of each spot

•The output is that log ratios at all intensities have a mean of 0

29
5. Spot and replicate filtering
“Improving the quality of data”

30
Filtering out low intensity spots
•The normalized log ratios at low intensity spots show greater
variation and are less reliable to identify differentially-expressed
genes

•Use some arbitrary cut-off for low intensity spots


Self hybridization experiment

31
Replicate filtering
•Plot the normalized log ratios from two replicate experiments

•Blue spots are within two standard deviations between both replicates
while brown spots > 2 SD are removed

Quackenbush (2002) Nature Genetics Suppl. 32 32


Dye-reversal replicates
•Uneven incorporation of Cy3 and Cy5 dyes can cause false
positives for differentially-expressed genes.

•If label a common mRNA sample with Cy3 and Cy5 and
hybridize on microarray, then all spots should have a mean of 1

Dye bias (spots are not at 0)

•Solution: dye reversal/swap experiment


1. Sample A-Cy3 vs. Sample B-Cy5
2. Sample B’-Cy3 vs. Sample A’-Cy5

•Merged normalized log ratios= [log2(A/B) + log2(B’/A’)]/2


Dabney and Storey (2007) 8:129-138 33
“Now that your data is normalized, what do
you do next?”

•Identify differentially-expressed genes

•Cluster analysis

34
Identifying differentially expressed genes
Most straight-forward way is to have a fixed fold change
cut-off (usually two fold)
+10 +10

Log ratio
Log ratio
2-fold cutoff
0 0

-10 -10

Problem is that the variability of the log ratio is


greater at lower intensities.

At lower intensity spots, genes can be


misidentified as differentially expressed.

At higher intensity spots, differentially-


expressed genes can be missed.
35
Z-score transformation
•Measures the number of standard deviations a particular data
point is from the mean/median

•Using a sliding window, calculate the local mean and standard


deviations within a window surrounding each data point (e.g. 0.25
log units of spot i)

•Zi=(log ratio ri-mi)/si, where mi and si are the local mean and
standard deviation, respectively

no need to know the formula , how far the spots are


0.5 log units from standard deviation

Zi>1.96 spots are


differentially regulated at
the 95% confidence level

Quackenbush (2002) Nature Genetics Suppl. 32 36


Learning objectives: you should be able to…

•Explain how microarrays are used to study


transcriptomes

•Describe how RNA samples are prepared for


microarray experiments

•Describe the characteristics of microarray probes?

•Explain why it is important to normalize microarray


data

•Describe the two approaches to identify differentially-


expressed genes

37

You might also like