CMMB 461 Dna Microarray 1 2019 For D2L1

Final not cummulative
DNA microarrays:
“procedure, fabrication, data processing and analysis”
CMMB 461
University of Calgary
Gordon Chua 1
Suggested Readings
1. C.A. Harrington et al. (2000) Monitoring gene expression using DNA
microarrays, Curr. Opin. Microbiol. 3, 285–291.
2. Hughes, T.R. and Shoemaker, D.D. (2001) DNA microarrays for

expression profiling; Curr Opin Chem Biol. 5, 21-5.
3. Quackenbush, J. (2002) Microarray data normalization and

transformation. Nat Genet. 32, Suppl:496-501.
4. Leung and Cavalieri (2003) Fundamentals of cDNA microarray data

analysis. Trends in Genetics 19, 649-659
5. M.B. Eisen et al. (1998) Cluster analysis and display of genome-wide

expression patterns. PNAS 95,14863-8..
2
1. Background and Introduction to
Microarrays:
“What are they and what led to its development?”
3
Transcriptome
•Transcriptome: defined as a complete set of transcripts
encoded in the genome and their relative levels of expression in
a particular cell or tissue type under defined conditions.
- every single RNA in the cell
•Characterizing the transcriptome can identify:
•Genes exhibiting cell and tissue-specific expression
•Genes aberrantly expressed in cell and tissue disease

(molecular basis behind the disease)
•Genes expressed in response to environmental toxins and

pharmaceutical compounds (mode of action and side effects)
•Genes expressed in response to pathogens (mode of infection

how pathogens affect the host? by looking at the gene can be identify by transcriptome
and virulence)
4
• Obtain blood transcriptomes of
104 ASD cases and 82 controls
(all males) - extracted the RNA in affected and control
• Found 55 genes differentially

regulated as candidates to
diagnose autism
• 68% accuracy for ASD

identification with these 55
genes for males, poorly for
females
• Blood test to detect autism may

be possible.
Kong et al. 2012 PLoS ONE 7: e49475 5
Northern blotting
•Conventional method to detect RNA transcripts of a cell and tissue.
•To characterize the transcriptome of a human cell or tissue type,

you would have to run 25,000 northerns and use 25,000 different
probes!
the probe will only bind to rna of interest
Wikipedia 6
DNA microarrays
•DNA/gene chip that contains single-stranded probes (25-70 nucleotides)
with sequence complementary to a specific gene/mRNA
•Each probe is present in many copies in a spot on the microarray
•Fluorescent-labelled mRNAs or cDNAs are placed on the microarray to

hybridize (complementary base pairing) to the probes
•The intensity of the fluorescence is proportional to the abundance of

mRNA/cDNA that bind to the probe.
•Allows the simultaneous monitoring of the expression (mRNA) level of

every gene in an organism in response to genetic and environmental
perturbation) - monitor the expression of every gene
•In a single experiment, (two weeks) can determine which genes in the
genome are transcriptionally turned on or off
7
Microarray probe design single stranded and must be very specific
. . . . . .
. . . . . .
. . . . . .
A T G T C C
T A C C C A
Side view C G C A T A Top view
A G T A T G
G C A C A C
C A A A G T
1. Specificity: unique for each gene, no cross hybridization
2. Homogeneity: bind to complementary DNA at same Tm
3. Sensitivity: not form 2o structures that interfere with

hybridizations
8
competitive hybridization
Microarray procedure
Wild type Mutant/drug
(Control) (Experimental)
Isolate total mRNA
X X
X X
X
X
X Y
X Y
Y
Y Y
Z Y Z Y
Z Y
Z Reverse transcribe and label
cDNA with red (Cy5) and
X
X X green (Cy3) fluorescent dyes
X X
X
X Y
X Y
Y
Y Y
Z Y Z Y
Z Y
Z Relative levels
for X there is more in Wild-typw so shows as red
X XX equal amount = yellow
XX grey spot= gene is not expressed
X X Y Z UP
X
X Y
Y DOWN
Y
Y Y
UNCHANGED
ZZZ
YY
Y NOT PRESENT
Z
9
2. Fabrication of Microarrays:
“How do they get oligonucleotides probes on a
matrix at such high densities?”
10
Ink-jet microarrays (Agilent)
Ink-jet print-head uniformly deposits small, accurate
volumes (picoliters) of nucleic acids building the 60-mer
oligonucleotide probes one base at a time onto a 1’ X 3’
glass slide
•Flexible, customizable
•All 60-mer probes are virtually functional
4 X 44K Expression •No need for expensive masks: cheaper
microarray •Density: >1,000,000 spots/array
http://www.agilent.com/about/newsroom/lsca/background/2007/bg_microarrays.pdf 11
Photolithographic microarrays (Affymetrix)
•Oligonucleotide probe synthesis
on wafer using combination of
photolithography and chemistry
they use light and chemistry, they add blocking agent
•Photomask: opaque plate with

holes that allow light to shine in
specific locations on the silicon
wafer
•Light removes blocking

compound which prevents base
addition to wafer
•Flood with a chemical base (e.g.

adenine) which attaches to
unblocked area of wafer
•Repeat this process with

blocking compound and new
http://www.affymetrix.com/estore/about_affymetrix/outreach/educ
ator/microarray_img_resources.affx#
photomask. 12
Affymetrix microarray protocol used mainly for human samples, as they couldn't get enough RNA
Involves amplification by transcription of

cDNA into cRNA (requires less starting
material)
Probes consists of mis-match and perfect

match types
probes are shorter= one of the probe is the perfect complementary sequence
Sample is hybridized at a specific

temperature where sample will bind to PM
and not MM probes
High hybridization ratio of PM to MM

indicates expression of a gene
25 bp probes are less specific and require

multiple probes/gene to compensate.
The use of masks in making the

microarray contribute to the higher cost
13
3. Labelling, Scanning and Image
Processing
“Getting colourful microarray images and

extracting the data”
14
Common ways to “label” nucleic acids
Random priming of double- Direct labeling of mRNA with
stranded DNA fluorescent molecules:
Reaction * *
contains *
labelled Amplification by transcription
nucleotides AAAAAAAA
*
AAAAAAAA
Poly-T primed cDNA synthesis
TTTTTTTTTT-T7 promoter
(Reverse transcription)
AAAAAAAA “second AAAAAAAA-T7 promoter
poly-A tsil so get probe with
ploy-T primed strand”
synthesis TTTTTTTTTT-T7 promoter
Reaction
AAAAAAAA T7 reaction
contains
labelled * * contains
** TTTTTTTTTT
* * labelled
nucleotides
* nucleotides
Courtesy of Tim Hughes 15
Fluorescence dyes for labelling microarray samples (Cy3 and Cy5)
•Fluorescence: emission of light by a molecule

that has absorbed light/radiation (excitation)
•Water-soluble fluorescent dyes of the cyanine

Excitation wavelengths family
•Cy5 dye is excited with a 635 nm red laser

and detected by a emission filter that passes
only 650-690 nm light
•Cy3 dye is excited with a 532 nm green laser

Emission wavelengths and detected by a emission filter that passes
only 557-592 nm light
•Fluorescent intensity is detected by a

photomultiplier tube
http://www.answers.com/topic/cy3-cy5-dyes-gif-1
http://www.jireurope.com/technical/images/GRAPH1.gif 16
How does microarray data initially look like?
•For each microarray, acquire two TIFF images (16-bit)
scanned with either the Cy5 (red) and Cy3 (green) channel
Cy5 channel
Cy3 channel
Red and Green Laser Scanner (Genepix) Merged

- measure the intensity of how red or green it is, which can tell how
muc it is expressed
https://www.youtube.com/watch?v=yzBVOCwRanI 17
Image processing: preparation of TIFF images for
raw data extraction
1. Array localization: find the spots (align the

grid)
2. Spot quality assessment: flag bad spots

(artifacts, bleeding)
3. Image segmentation: categorize each pixel

as signal, background or other
4. Quantification: assign signal and background

values to each spot
18
Image processing: array localization and spot
quality assessment
Grid construction by defining number and size of

spots in columns and rows and spacing in between
http://transcriptome.ens.fr/sgdb/tools/images
19
Image Segmentation
Spatial segmentation •Partition the image to determine
which pixels constitute signal or
background
•Use an inner circle to calculate

signal value and pixels outside the
outer circle as local background
•Problem: sometimes inner circle is

not small enough for tiny spots
•Intensity-based segmentation: rank intensity of pixels and take a

cut-off equivalent to the approximate area of the spot = signal
•Can use a combination of the two types of segmentation
•Background correction also can be blank spots or control spots

of exogenous DNA
20
Quantification of signal and background
•Mean, median, mode and total intensity of segmented signal
(microarray spots) and background pixels are determined in a text
file (e.g. gpr file)
•Signal intensity =total spot intensity-background intensity
•Median is usually used because it is more robust to outliers

Cy5
Spot
Cy3
location on
microarray
Genes
21
4. Microarray Data Pre-processing and
Normalization
“Correct the data first before spending all your
time analyzing it”
22
Log transformation of expression ratios
•When comparing relative abundance of gene expression between two
samples, take the ratio of Cy5/Cy3 values (R/G)
•Log the expression ratios (log(R/G)): increases symmetrical distribution

of data (upregulated and downregulated genes are treated equally)
•2-fold change (R/G) = 2 or 0.5 while log2(R/G) = +1 or -1
log (R/G)
R/G
http://www.bio.davidson.edu/people/macampbell/ACS_MAGIC/transform.html 23
Microarray Data Normalization
Required to correct for variations caused by:

•Unequal amounts of cDNA
•Distinct dye properties (fluorescence/quenching)
•Differences in dye incorporation
•Differences in scanning
24
Within array/single experiment normalization
Cy3-control Cy5-experimental
Log intensity-Cy3
Log intensity-Cy5
•Assumption: most genes are not differentially regulated
•Graph looks like the vast majority of genes (spots) are up-
regulated in the experiment
•More likely that more labelled-Cy5 cDNA used in the hybridization

or non-linear dye properties 25
Global linear normalization
•Assume equal quantities of cDNA and total intensity of Cy3 and Cy5
•Normalization constant=Σ(Cy3)/Σ(Cy5) [e.g. 10,000/20,000 = 0.5]
•For each gene, multiply the Cy5 intensity by the normalization

constant=0.5 (make ratios =1)
•Only works partially because the relationship is not linear.

Log intensity-Cy3
Log intensity-Cy3
Log intensity-Cy5 Log intensity-Cy5

26
Scatterplot versus M/A (R/I) plot
M=log2(R/G)
log2(G)
log2(R) A=(½ )*log2(R*G)
•A= intensity (brightness) of microarray spots

•M=log expression ratio
•M/A plot allows for detection of intensity-dependent effects

on log expression ratios.
•Plots above shows that most of the greener spots are low
intensity spots
http://compbio.pbworks.com/w/page/16252907/Microarray%20Normalization%20and%20Gene%20Expression%20Index 27
Global Lowess (locally weighted linear regression)
•Performs a series of local regressions in overlapping windows with a
weighted average of neighbouring spots (curve fitting and correction)
•Each regression is combined to make the Lowess smooth curve (weighted

average values: closer spots have greater weight that far-away spots
Window
Fitted line is a function of

mean intensity
28
Global Lowess (locally weighted linear regression)
•Normalized log (R/G)=log(R/G)-Lowess correction
•Lowess correction: subtracting the deviation/distance of the Lowess curve

from the zero axis from the log ratios of each spot
•The output is that log ratios at all intensities have a mean of 0
29
5. Spot and replicate filtering
“Improving the quality of data”
30
Filtering out low intensity spots
•The normalized log ratios at low intensity spots show greater
variation and are less reliable to identify differentially-expressed
genes
•Use some arbitrary cut-off for low intensity spots

Self hybridization experiment
31
Replicate filtering
•Plot the normalized log ratios from two replicate experiments
•Blue spots are within two standard deviations between both replicates
while brown spots > 2 SD are removed
Quackenbush (2002) Nature Genetics Suppl. 32 32

Dye-reversal replicates
•Uneven incorporation of Cy3 and Cy5 dyes can cause false
positives for differentially-expressed genes.
•If label a common mRNA sample with Cy3 and Cy5 and
hybridize on microarray, then all spots should have a mean of 1
Dye bias (spots are not at 0)
•Solution: dye reversal/swap experiment

1. Sample A-Cy3 vs. Sample B-Cy5
2. Sample B’-Cy3 vs. Sample A’-Cy5
•Merged normalized log ratios= [log2(A/B) + log2(B’/A’)]/2

Dabney and Storey (2007) 8:129-138 33
“Now that your data is normalized, what do
you do next?”
•Identify differentially-expressed genes
•Cluster analysis
34
Identifying differentially expressed genes
Most straight-forward way is to have a fixed fold change
cut-off (usually two fold)
+10 +10
Log ratio
Log ratio
2-fold cutoff
0 0
-10 -10
Problem is that the variability of the log ratio is

greater at lower intensities.
At lower intensity spots, genes can be

misidentified as differentially expressed.
At higher intensity spots, differentially-

expressed genes can be missed.
35
Z-score transformation
•Measures the number of standard deviations a particular data
point is from the mean/median
•Using a sliding window, calculate the local mean and standard

deviations within a window surrounding each data point (e.g. 0.25
log units of spot i)
•Zi=(log ratio ri-mi)/si, where mi and si are the local mean and
standard deviation, respectively
no need to know the formula , how far the spots are

0.5 log units from standard deviation
Zi>1.96 spots are

differentially regulated at
the 95% confidence level
Quackenbush (2002) Nature Genetics Suppl. 32 36

Learning objectives: you should be able to…
•Explain how microarrays are used to study

transcriptomes
•Describe how RNA samples are prepared for

microarray experiments
•Describe the characteristics of microarray probes?
•Explain why it is important to normalize microarray

data
•Describe the two approaches to identify differentially-

expressed genes
37

CMMB 461 Dna Microarray 1 2019 For D2L1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CMMB 461 Dna Microarray 1 2019 For D2L1

Uploaded by

Copyright:

Available Formats

Final not cummulative

2. Hughes, T.R. and Shoemaker, D.D. (2001) DNA microarrays for

3. Quackenbush, J. (2002) Microarray data normalization and

4. Leung and Cavalieri (2003) Fundamentals of cDNA microarray data

5. M.B. Eisen et al. (1998) Cluster analysis and display of genome-wide

•Characterizing the transcriptome can identify:

•Genes exhibiting cell and tissue-specific expression

•Genes aberrantly expressed in cell and tissue disease

•Genes expressed in response to environmental toxins and

•Genes expressed in response to pathogens (mode of infection

• Found 55 genes differentially

• 68% accuracy for ASD

• Blood test to detect autism may

•To characterize the transcriptome of a human cell or tissue type,

•Each probe is present in many copies in a spot on the microarray

•Fluorescent-labelled mRNAs or cDNAs are placed on the microarray to

•The intensity of the fluorescence is proportional to the abundance of

•Allows the simultaneous monitoring of the expression (mRNA) level of

1. Specificity: unique for each gene, no cross hybridization

2. Homogeneity: bind to complementary DNA at same Tm

3. Sensitivity: not form 2o structures that interfere with

•Photomask: opaque plate with

•Light removes blocking

•Flood with a chemical base (e.g.

•Repeat this process with

Involves amplification by transcription of

Probes consists of mis-match and perfect

Sample is hybridized at a specific

High hybridization ratio of PM to MM

25 bp probes are less specific and require

The use of masks in making the

“Getting colourful microarray images and

•Fluorescence: emission of light by a molecule

•Water-soluble fluorescent dyes of the cyanine

•Cy5 dye is excited with a 635 nm red laser

•Cy3 dye is excited with a 532 nm green laser

•Fluorescent intensity is detected by a

Red and Green Laser Scanner (Genepix) Merged

1. Array localization: find the spots (align the

2. Spot quality assessment: flag bad spots

3. Image segmentation: categorize each pixel

4. Quantification: assign signal and background

Grid construction by defining number and size of

•Use an inner circle to calculate

•Problem: sometimes inner circle is

•Intensity-based segmentation: rank intensity of pixels and take a

•Can use a combination of the two types of segmentation

•Background correction also can be blank spots or control spots

•Signal intensity =total spot intensity-background intensity

•Median is usually used because it is more robust to outliers

•Log the expression ratios (log(R/G)): increases symmetrical distribution

•2-fold change (R/G) = 2 or 0.5 while log2(R/G) = +1 or -1

Required to correct for variations caused by:

•Distinct dye properties (fluorescence/quenching)

•Differences in dye incorporation

•Assumption: most genes are not differentially regulated

•More likely that more labelled-Cy5 cDNA used in the hybridization

•Normalization constant=Σ(Cy3)/Σ(Cy5) [e.g. 10,000/20,000 = 0.5]

•For each gene, multiply the Cy5 intensity by the normalization

•Only works partially because the relationship is not linear.

Log intensity-Cy5 Log intensity-Cy5

log2(R) A=(½ )*log2(R*G)

•A= intensity (brightness) of microarray spots

log2(R) A=(½ )log2(RG)