Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Appl Microbiol Biotechnol (2006) 73:255–273

DOI 10.1007/s00253-006-0584-2

MINI-REVIEW

DNA microarray technology for the microbiologist:


an overview
Armin Ehrenreich

Received: 11 July 2006 / Revised: 11 July 2006 / Accepted: 11 July 2006 / Published online: 17 October 2006
# Springer-Verlag 2006

Abstract DNA microarrays have found widespread use as (Wildsmith and Elcock 2001). Because the term DNA
a flexible tool to investigate bacterial metabolism. Their microarray was coined in publications from the laboratory
main advantage is the comprehensive data they produce on of DeRisi et al. (1996) and Schena et al. (1995), this
the transcriptional response of the whole genome to an technique evolved from a very specialized method that is
environmental or genetic stimulus. This allows the micro- only available to few people (Bowtell 1999; Cheung et al.
biologist to monitor metabolism and to define stimulons 1999; DeRisi et al. 1997; Lashkari et al. 1997) to a common
and regulons. Other fields of application are the identifica- tool with many different applications that became important
tion of microorganisms or the comparison of genomes. The in microbiology (Dharmadi and Gonzalez 2004). There are
importance of this technology increases with the number of several other types of microarrays, like protein microarrays,
sequenced genomes and the falling prices for equipment but the DNA microarray is by far the most widespread and
and oligonucleotides. Knowledge of DNA microarrays is of will simply be termed microarray in this review.
rising relevance for many areas in microbiological research. The essence of microarray technology is the parallel
Much literature has been published on various specific hybridization of a mixture of labeled nucleic acids called
aspects of this technique that can be daunting to the casual target, with thousands of individual nucleic acid species
user and beginner. This article offers a comprehensive called probes, that can be identified by their spatial position
outline of microarray technology for transcription analysis in a single experiment. The location of a specific probe on
in microbiology. It shortly discusses the types of DNA the array is termed spot or feature. Whereas the probes are
microarrays available, the printing of custom arrays, immobilized on a solid support, the targets are applied as a
common labeling strategies for targets, hybridization, solution onto the array for hybridization after fluorescent
scanning, normalization, and clustering of expression data. labeling (Brown and Botstein 1999). The nomenclature of
probes and targets sometimes got mixed up in literature,
but the definition used in this review was given in a
Introduction special issue of nature genetics (Phimister 1999) and is
now commonly agreed upon. Transcription analysis with
DNA microarrays are a powerful tool for the investigation microarrays is a complex process. Figure 1 gives a brief
of various aspects of prokaryotic biology because they overview on the steps involved and discussed in this
allow the simultaneous monitoring of the expression of all review. Much literature has been published on specific
genes in any bacterium. They offer a more holistic aspects, but the complexity of the topic and the amount of
approach to study cellular physiology and therefore literature are sometimes daunting to the people starting to
complement the traditional “gene-by-gene” approaches approach the topic. This review wants to give an overview
and explain the major applications of DNA microarrays in
microbiology. It tries to clarify important points, but will
A. Ehrenreich (*)
not go into special experimental approaches, details of
Institute of Microbiology and Genetics, Georg August University,
37077 Göttingen, Germany equipment, or advanced statistical methods that are not
e-mail: aehrenr@gwdg.de commonly used in microbiology.
256 Appl Microbiol Biotechnol (2006) 73:255–273

b
a Image analysis: Placement
of feature indicators and
quantification of flourescence
data

sample 1 sample 2 image analysis


Background correction and
quality filtering
extract total
RNA
annotated genomic structure
Data transformation
raw images of
PCR amplify each channel
probes label RNA using
fluorescent dyes Data normalization

scan flourescence
signal
Testing for differential gene
expression

hybridize labeled
spotting the PCR targets
products Cluster analysis or
biological interpretation

Data storage in local


or public databases

Fig. 1 Main steps in transcription analysis with microarrays. a Probes are fluorescence of the features is determined using an array scanner. b After
generated from an annotated genome sequence and spotted on a image analysis, quality filtering, data transformation, and normalization
microarray slide. For target preparation, RNA is extracted from two are done. The remaining steps are dependent on the experiment, but in
experimental conditions and labeled with fluorescent dyes by reverse most cases, the data are tested for different gene expression, clustered,
transcription. The labeled target is then hybridized with the array, and the and finally stored in a database

Types of DNA microarrays synthesized in situ directly onto the surface of the chip.
The further two types have in common that independent
Microarrays evolved from Southern blots (Southern 1975, synthesized probes are printed on special glass slides.
2001), colony filters (Nguyen et al. 1995), to dot blots According to the nature of the probe, they can be classified
(Kafatos et al. 1979). “DNA macroarrays” or “filterarrays” as (2) double-stranded DNA microarrays and (3) oligonu-
were made in a next step of miniaturization by using cleotide DNA microarrays.
robotic devices for spotting thousands of probes on a nylon
membrane. This number was already enough to probe each Affymetrix GeneChips The most prominent microarrays
gene of a bacterial genome. The targets were labeled with in situ synthesized probes are the GeneChips manu-
radioactively (Granjeaud et al. 1999), thereby allowing only factured by Affymetrix (Santa Clara, CA, USA). They are
one hybridization at a time. The disadvantage of so-called produced by chemical synthesis of the oligonucleotides
one-channel experiments is that the variance of each single directly on the coated quartz surface of the array (Hughes et
array is affecting the final expression ratios. This problem al. 2001; Lipshutz et al. 1999; Lockhart et al. 1996;
was solved by two-channel experiments in which two Warrington et al. 2000). This technology allows very high
mRNA populations are labeled with different fluorescent feature densities. It is typical to have 400,000 features on a
dyes and are hybridized simultaneously on one array. Glass commercial array (Lander 1999; Ramsay 1998). Therefore,
slides are commonly used as support because of their low they are called high-density oligonucleotide arrays. Gene-
background fluorescence (Schena et al. 1995, 1996). Chips are produced in a unique photolithographic process
Moreover, the rigid glass slides allow much higher probe analogous to the methods used for production of micro-
density than the flexible membranes of macroarrays, electronics chips in combination with chemical reactions
thereby reducing the amount of target required. Addition- developed for combinatorial chemistry (Fodor et al. 1991).
ally, glass allows covalent linkage of DNA to the surface A quartz wafer is coated with a narrow layer of a light-
and is inert to high ionic strength washing and high sensitive compound. This coating prevents the covalent
temperature. coupling of an activated nucleotide. Exposure to light
There are three main types of DNA microarrays in causes the removal of the chemical protection groups from
widespread use: (1) microarrays where the probes are the surface. Subsequently applied reactive derivates of
Appl Microbiol Biotechnol (2006) 73:255–273 257

single nucleotides can then be coupled. The attached printer. Contact printers spot the features by various types
nucleotides again carry a light-sensitive protection group of pins. These include split or channeled pins, flat-tipped
that has to be removed by illumination before coupling the pins, and “pin and ring” type of pins (Zhou and Thompson
next nucleotide. Lithographic masks are used to block or 2004). All the pins initially dip into a solution of the probe
transmit light onto specific features, thereby determining and then onto the slide surface, thereby placing small
the order of nucleotide to be coupled to the growing droplets in the range of less than 1 up to a few nanoliters on
oligonucleotides. In repeated cycles of masking, light the surface of the slide. This results in features of 100–
exposure, and coupling, oligonucleotides of 25 residues' 150 μm in diameter, with their centers positioned in a 190-
length are synthesized on the chip surface. As the to 250-μm grid. Before spotting the next probe, the pins are
specificity of a probe of 25 nucleotides may not be high automatically washed. From one up to hundreds of pins can
enough, each probe (“match”) is accompanied by a be assembled in so-called printheads. The features printed
negative control with a single differing base in the middle by one pen of a printhead are sometimes referred to as pen
of the probe termed mismatch probe. Performance of probe group or subgrid. Some practical points have to be kept in
and mismatch probe can therefore be used to detect and mind: when the printhead has multiple pins, their length has
eliminate cross-hybridization. Probe and mismatch probe to be perfectly aligned to produce features of similar size.
are called a probe pair. Usually, 11 to 15 probe pairs, called Even slight misalignments can result in features of varying
a probe set, are used to represent a single gene. The very size or missing features. There is also a finite number of
high feature density in this type of microarray enables the features a pin can print before it has to be replaced. It is
high number of controls. The automatic production process very important to confirm by preliminary tests that all pens
guarantees a very high reproducibility and enables a distinct produce features of identical size to avoid systematic biases
experimental design: Whereas the DNA microarray types in the data produced (Hessner et al. 2004).
discussed later in this text are typically used with two Noncontact printers use bubble jet (Okamoto et al. 2000)
differentially labeled targets, Affymetrix chips are hybrid- or inkjet (Lemmo et al. 1998) technology analogous to
ized with only one labeled target. This allows different computer printers. They shoot small droplets containing the
labeling techniques, excludes all dye effects described later, probe on the surface of the chip. Problems that occur with
and eases experimental design and statistical analysis. this technology are cross-contamination and clogging of the
However, all those advantages have to be balanced with capillaries, which result in missing spots.
the high costs for the design and use of such arrays. A large The main advantage of printed microarrays is their
number of lithographic masks have to be created, and chip standardized dimension. Historically, the first microarrays
production is only possible by Affymetrix. This fact almost were printed on microscope slides; therefore, a size of
excludes any changes to the probes used in a microarray 25.25×75.75 mm and a thickness of 1.0 to 1.2 mm is
due to updates of sequences or annotation. Affymetrix commonly used. This allows a free choice of spotters,
chips are only available for very few microorganisms. hybridization equipment, scanners, and software from
Saccharomyces cerevisiae, Escherichia coli, Bacillus sub- different suppliers or to modify slide chemistry. On the
tilis, Pseudomonas aeruginosa, and Salmonella typhimu- other hand, printed microarrays have a much lower feature
rium are the only ones listed on the Affymetrix Web site. density that can be obtained as compared with GeneChips:
Others would require an expensive custom design by the about 10,000 to 30,000 features can be spotted on a single
company. Another fact that has to be kept in mind is that chip. Although there are prespotted slides for various model
the whole equipment for hybridizing, scanning, and organisms and companies that do commercial custom DNA
analyzing Affymetrix chips are proprietary. microarray spotting, it is also possible to spot the DNA
There have been other reports on DNA microarrays microarrays in the laboratory (Bowtell 1999; Cheung et al.
where the probes are synthesized directly onto the chip 1999). However, this is not a cheap undertaking. It is
surface by using inkjet technology and conventional solid- advisable to install the spotter in a room with a controlled
phase phosphoamidit technology. However, so far, there is environment or, better yet, a clean room with regulated
no widespread use of this technology, although companies temperature and humidity. This is important because the
like Agilent Technologies (Palo Alto, CA, USA) are now small volumes of liquid evaporate quickly, and it is hard to
using the principle for custom-made arrays (Hughes et al. get reproducible results otherwise. Moreover, dust particles
2001). can interfere with spotting. Because of the costs associated
and the special expertise needed for printing DNA micro-
Printed microarrays Here the probes are synthesized arrays, many research institutes have microarray core
independently and then spotted on the surface of the array facilities to handle this task (Searles 2003). Despite these
by a microarray spotter (Hegde et al. 2000). There are two difficulties, custom-made DNA microarrays offer the
different technologies: contact printer and noncontact advantage of producing arrays for any species or strains,
258 Appl Microbiol Biotechnol (2006) 73:255–273

irrespective of commercial interests. Moreover, it is of a typical prokaryote with only one specific primer,
possible to print varying numbers of arrays, change slide thereby greatly reducing the costs. Moreover, it increases
chemistry, quickly adjust to progress in annotation, exclude the accuracy because only the correct combination of
probes for genes of no interest, or include probes of specific primers and template results in a PCR product of the
relevance such as intergenic regions. expected size. Nevertheless, the generation of whole-
genome DNA microarrays by high-throughput PCR ampli-
Double-stranded DNA microarrays There are two major fication is a very laborious and logistically demanding
types of probes that are used with DNA microarray printers: process. Extensive quality control by gel electrophoresis,
double-stranded DNA and oligonucleotides. Double-stranded purification of products, and repetition of dropout reactions
DNA commonly results from polymerase chain reaction is necessary.
(PCR) amplification (Duggan et al. 1999). A 200- to 800-bp The double-stranded DNA is printed on slides with
length of amplified DNA is recommended, but larger positively charged coating (Aboytes et al. 2003). In most
fragments of up to 1.3-kb length also work (Heller et al. cases, they are coated with poly-L-lysine or 3-aminopropyl-
1997). In typical microarray design, each probe DNA trimethoxysilane (APS). Spotting is typically done with a
corresponds to one gene. This represents the original type 1:1 solution of purified PCR products at a final concentra-
of DNA microarrays where cDNA molecules from Arabi- tion of 0.2 to 1 mg/ml and dimethylsulfoxide (DMSO)
dopsis thaliana were amplified by PCR and spotted (Hegde et al. 2000). The DNA is bound to the slide surface
(Schena et al. 1995). In prokaryotes, two specific primers by electrostatic interaction with the negatively charged
together with chromosomal DNA as template are used to phosphate backbone of the nucleic acid as shown in Fig. 2a
amplify genes or parts thereof. However, there are also (Sanchez-Cortes et al. 2002). This also helps to separate the
numerous variations to this strategy. For example, clones two strands of the double-stranded DNA. Additional baking
from a shotgun library that originates from a sequencing at approximately 80°C or UV cross-linking is thought to
project can be used as a template, thereby permitting the introduce covalent links primarily of thymine residues in
usage of shorter primers. Such clones allow the amplifica- the DNA to the amino groups of the slide surface (Reed and
tion of parts of the genes with only one specific primer. Mann 1985; Saito et al. 1981). An additional blocking step
This way, it is possible to amplify about 80% of the genes is required to prevent nonspecific interaction of the slide

a
- - - - - -
electrostatic interaction Covalent coupling
by UV or heat
- - - - - -
+ + + + + + + + -+ - + - + - -
NH3 NH3 NH3 NH3 NH3 NH3 NH3 NH3 NH3 NH3 NH3 NH3

H2O
NH2
Nucleophilic addition

Formation of Schiff base

N H N H
O H
C HO C H H C H

Fig. 2 Modes of probe immobilization on microarray slides. a Im- covalent linkage is achieved by backing or UV irradiation. b Covalent
mobilization of double-stranded DNA on a slide coated with amino- attachment of oligonucleotides with a 5′ amino linker to a slide surface
silane. The negatively charged phosphate backbone is attached to the that exposes aldehyde groups
positively charged slide surface by electrostatic interaction. Additional
Appl Microbiol Biotechnol (2006) 73:255–273 259

surface with target DNA especially for arrays printed on al. 1997). The most common modification of oligonucleo-
poly-L-lysine coated slides. This “postprocessing” is done tides is a 5′ amino group (Zammatteo et al. 2000). It offers a
by incubating the slides in a freshly prepared succinic high flexibility in the choice of slide chemistry: as
anhydride solution that readily reacts with the amino groups illustrated in Fig. 2b, aldehyde and epoxy groups react
from poly-L-lysine (Xiang and Brownstein 2003). The especially readily with the primary amino group. Modified
coated slides are commercially available from many oligonucleotides are normally spotted at a concentration of
suppliers but can also be prepared in the laboratory by 10–30 μM. The conditions have to be adjusted so that the
intense cleaning of special microscopic slides and dipping coupling can proceed. Finally, the functional groups of the
them in poly-L-lysine solution. An advantage of DNA array that are not part of a feature have to be blocked,
microarrays made from spotting double-stranded DNA is similar to arrays made from double-stranded DNA probes.
their higher hybridization specificity, sensitivity, and their Depending on slide chemistry, this can be done, for
lower cost. They are indispensable whenever the sequence example, by incubating the slides in the presence of low
of the organism under study is not available. Their biggest molecular primary amines. Printed microarrays can be
drawbacks are the laborious production of PCR products stored for many months if they are protected from light
and the errors in probe identity that result from mistakes and kept under completely dry conditions in a desiccator
during their generation. It has been reported that 1 up to 5% (Worley et al. 2000).
of probes might have a wrong identity in commercial
cDNA microarrays (Knight 2001).

Oligonucleotide DNA microarrays Using synthetic oligo- Methods used for target labeling
nucleotides as probes is an alternative to double-stranded
DNA (Kane et al. 2000; Southern et al. 1999) because they Many different fluorescent dyes and other labeling agents
need much less logistics and are less error-prone due to have been described in the literature (Badiee et al. 2003;
automatic manufacturing of the oligonucleotides by the Schena and Davis 2000), but the cyanine dyes Cy-3 and
suppliers and their well-documented delivery in microtiter Cy-5 are most commonly used, offering strong fluores-
plates. Their initial disadvantage of lower specificity and cence, similar chemical properties, well-separated fluores-
sensitivity as a result of short oligonucleotides of 25-bp cence spectra, and little adherence to chip surface.
length have been overcome by using longer probes with a In contrast with common expectation, they are not green
length of 50 to 70 bp (Barczak et al. 2003; Bates et al. and red themselves, but they get those colors only after
2005; Calevro et al. 2004). This short probe length is a scanning by computer false coloring. There are two main
major advantage of oligoprobes because it allows the strategies for their incorporation in cDNA by reverse
monitoring of the transcription of very small open reading transcription (RT) of RNA (Wildsmith et al. 2001).
frames or to focus transcription analysis to intergenic
regions. However, oligonucleotides as probes require a Direct labeling In the direct labeling protocol, the dye is a
careful design (Emrich et al. 2003; Herold and Rasooly derivative of a nucleotide triphosphate, like Cy-3 deoxyur-
2003; Rouillard et al. 2003). All calculated melting points idine triphosphate (dUTP) or Cy-3 deoxycytosine triphos-
must fall into a temperature range of 5°C, and self- phate (dCTP). It is incorporated during RT of the RNA into
homology has to be avoided. Because of their short size, cDNA. One of the deoxynucleoside triphosphates (dNTPs),
oligonucleotides are commonly attached to the slide surface either the dCTP or the dUTP, needed by the reverse
by covalent coupling. Otherwise, a significant amount of transcriptase is provided at lower concentration. In addition,
probe would be lost from the array surface during the derivative of the corresponding dye is added (Khodursky
hybridization and washing. A large multiplicity of chemical et al. 2003), resulting in incorporation of the dye. For two-
reactions has been proposed to achieve covalent coupling, channel experiments, RNA prepared from cells grown at
but the majority of slides used for spotting oligonucleotides two different conditions is included in the hybridization
are coated with compounds providing aldehyde or epoxy experiment. One of the RNA preparations is labeled with
functional groups. To achieve covalent linkage, oligonu- Cy-3, the other with Cy-5. After labeling and removal of
cleotides with modifications at the 5′ or at the 3′ end are remaining free dye, roughly equal amounts of Cy-3 and
used. This increases the availability of the probe sequences Cy-5 dye incorporated in the cDNA are subjected to hy-
for hybridization with target because it is not fixed to the bridization. Whereas direct labeling is more widespread, it
surface by its backbone or bases. A further increase in has the fundamental problem that Cy-3 and Cy-5 are
sensitivity can be obtained by inserting spacer molecules incorporated with different yields. In practice, this differ-
between the oligonucleotide and the slide surface (Beier ence can be quite substantial because the Cy-3 and Cy-5
and Hoheisel 1999; Ghosh and Musso 1987; Shchepinov et molecules have a different size. This results in a lower rate
260 Appl Microbiol Biotechnol (2006) 73:255–273

of integration of the Cy-5-modified nucleotide in cDNA as labeling reaction of a prokaryotic organism whereas only 2
compared with the Cy-3-modified one. This artificial bias to 5 μg of total RNA are needed for a eukaryote (Duggan et
has to be corrected by normalization to obtain relevant al. 1999). Numerous attempts to circumvent this problem
biological data. have been published such as preparation of polyadenylated
mRNA from prokaryotes (Wendisch et al. 2001) or priming
Indirect labeling To circumvent this major source of error, a with a primer set that has a higher probability of priming
different strategy of labeling called indirect labeling is used. the RT of mRNA than of rRNA (Talaat et al. 2000).
In this case, both RNA preparations are reverse-transcribed However, none of them has been widely adopted. An
to cDNA in the presence of an aminoallyl-modified additional major problem when working with prokaryotic
dUTP or dCTP, respectively. Since both preparations are mRNA as compared with eukaryotic mRNA is its distinct
labeled with the same molecule, there is no bias. instability. Prokaryotic mRNA only has a half-life in the
Additionally, this modification is much smaller than Cy dyes range of 40 s to 20 min for individual transcripts (Kushner
and, thus, better incorporated in the cDNA. In a second step, 1996). The average, as measured with isotopic labeling, is
N-hydroxysuccinylimidyl ester (NHS ester) derivatives of around 1 min (Baracchini and Bremer 1987; Neidhard et al.
Cy-3 or Cy-5 are coupled to the aminoallyl-modified cDNA 1990), whereas microarray experiments indicate that 80%
molecules by a chemical reaction that is far less sensitive to of E. coli transcripts have half-lives ranging from 3 to
the molecule size of the dye. The disadvantages of this 8 min (Bernstein et al. 2002). Because bacterial RNAses are
protocol are the extreme moisture sensitivity of the NHS responsible for this rapid turnover (Kushner 2002), this
ester–modified dyes and the requirement of significantly clearly indicates that prokaryotic mRNA is much harder to
more bench work. The often-stated advantage of requiring handle than eukaryotic mRNA. This instability demands
less RNA as starting material is neutralized by the losses due special care during preparation of prokaryotic RNA to
to the two purification steps. avoid artifacts. It is possible to fail to observe the
expression of certain genes simply because of degradation
Labeling of genomic DNA Labeled genomic DNA is used of the corresponding mRNA during the experiment. It also
for comparative genomic studies (Borucki et al. 2003; Chan has to be kept in mind that depending on the promotor and
et al. 2003; Salama et al. 2000) as a reference target in its regulation in E. coli transcription initiation takes place
normalization or for slide quality control. The labeling is at a rate of once per second to once per generation (Record
usually done by direct incorporation of Cy-3- or Cy-5- et al. 1996). The transcript elongation proceeds at a rate of
labeled nucleotides in a nick translation or by random 40–50 nucleotides per second (Richardson and Greenblatt
priming with the Klenow fragment of DNA polymerase. 1996). This means that even for a relatively large protein,
For the random priming, DNA fragments of 1- to 3-kb sizes like lacZ, the first β-galactosidase proteins appear 1 min
are usually generated by sonication, nebulization, or by after the initial signal for gene induction occurred. This
digestion with restriction enzymes with a four-base recog- should illustrate how quickly microorganisms adjust their
nition site, like AluI or Sau3AI. For a single hybridization, transcription to environmental changes that might occur
a labeling reaction contains 0.5–2 μg of genomic DNA during harvesting the culture and cell disruption. Therefore,
(Amon and Ivanov 2003; Ye et al. 2001). to prevent observing mainly the Save Our Souls (SOS)
response or the response to oxygen limitation, it is critical
Target preparation A specific problem of working with to immediately cool cells carefully during harvest and use
prokaryotic organisms is that there are no widely adopted an appropriate method for cell disruption and RNA
protocols for selectively labeling the mRNA. Whereas the extraction that minimizes RNA degradation. Traditional
mRNA of eukaryotic organisms has a poly(A) tail that can methods of cell disruption, like incubation with lysozyme,
be utilized to specifically label mRNA with oligo(dT) french pressing, or sonification, are often not suitable
primers, prokaryotic mRNA lacks the poly(A) tails, and because they take too much time. A very flexible solution
random priming either with hexamers or nonamers has to be that works for many organisms is to freeze cells using
used. Therefore, only total RNA can be labeled. But only 4% liquid nitrogen and grind the frozen cells in a cooled ball
of the total RNA is mRNA, the rest being mainly rRNA and mill. The resulting powder of grounded cells is dissolved in
tRNA (Neidhard et al. 1990; Talaat et al. 2000). The large a buffer containing a high concentration of the strong pro-
amount of labeled RNA results in a higher background in tein denaturant guanidinium isothiocyanate before thawing,
DNA microarray experiments with prokaryotic organisms thereby inhibiting any RNAse activity. Alternatively, a cold
and requires a substantial higher amount of total RNA to be “stop solution” composed, for example, of ethanol and
added to the labeling reaction. Although lower numbers phenol at a low pH can be used to stop any transcription
have been published for certain protocols, as a rule of immediately and prevent RNA breakdown (Moore et al.
thumb, 20 to 25 μg of total RNA have to be included in a 2005). Total bacterial RNA can then be prepared with
Appl Microbiol Biotechnol (2006) 73:255–273 261

commercial kits such as RNeasy from Qiagen (Hilden, based on the same chemistry have to be handled identically.
Germany). The quality of RNA is pivotal for transcription They are not well suited to deal with small number arrays
analysis, and RNA quality should be controlled, for based on varying chemistries.
example, by denaturing formaldehyde agarose gel electro-
phoresis or RT PCR. Scanning The microarrays are scanned with microarray
scanners. Their appropriate driver and image analysis
software determines the raw values (Bassett et al. 1999).
GenePix (Axon Instruments, Inc., Union City, CA, USA)
Hybridization and data acquisition and ArrayVision (Imaging Research, Ontario, Canada)
software are examples of widely used softwares for image
Hybridization Hybridization of DNA microarrays can be analysis and raw data acquisition. In principle, it is possible
done in two different ways. The “classical” approach to scan standard-sized slides with any scanner. Exceptions
includes placing labeled, denatured target on a slide and are a few slide types with a nonplanar surface that exclude
carefully covering it with a coverslip. This requires some confocal scanners. For successful data acquisition, a data
skillfulness because the coverslip needs to be level to file is needed that identifies the features and defines their
prevent gradients in hybridization and avoid trapped air dimensions and locations. The GenePix array list (.gal) file
bubbles. The slide is then placed in a humid chamber to format is often used for this purpose. The scanners mostly
prevent desiccation during hybridization and incubated at use lasers for exciting the surface of the hybridized
the hybridization temperature. The hybridization tempera- microarray with a resolution of a few micrometers (Bowtell
ture ranges mostly from 40 to 65°C for 5 to 12 h. It depends 1999). The resolution of scanning should be better than
on the organism studied and the composition of the 10% of the spot size, that is, features of 150-μm size need
hybridization buffer (Cheung et al. 1999). In most cases, to be scanned at least at 15 μm resolution. The fluorescence
saline sodium citrate (SSC) buffer with added detergent is emitted from the dyes hybridized to the features is collected
used. Addition of Denhardt's solution, sheared salmon and quantified by photomultiplier tubes or charge-coupled
sperm DNA, or tRNA reduces the background. The device (CCD) cameras. There is a variety of scanners on the
addition of formamide, dextran sulfate, or polyethylene market differing in their technological configurations
glycol can improve binding of low-copy number transcripts (Ramdas et al. 2001). Normally, the scanner generates
(Cheung et al. 1999; Wildsmith and Elcock 2001). gray-scale images of the fluorescence at 532 and 635 nm.
Hybridization temperature is critical for oligonucleotide The data are stored in a lossless tagged image file format
slides and has to be carefully optimized. As a rule of (TIFF) that is used for quantification by image analysis. A
thumb, the optimization can start at a hybridization color depth of 2 byte is characteristic for most scanners,
temperature 15°C below the mean melting temperature of which means that each pixel can assume 65,535 different
the oligonucleotides used. Following hybridization, the intensity levels. The sensitivity of the scanner has to be
slides are washed to remove unspecific bound target. More adjusted to ensure that most of the pixels in the picture do
stringent washing steps are performed at the end of the not saturate its dynamic range. It is convenient to roughly
washing procedure. This can be achieved either by adjust the sensitivity of the scanner during a prescan so that
decreasing the ionic strength or increasing the washing constitutive controls result in roughly equal signals. For
temperature (Wildsmith and Elcock 2001). Typical proto- microarrays made from double-stranded DNA, this can also
cols use decreasing SSC buffer concentrations first with be done by spotting chromosomal DNA and adjusting these
small concentrations of sodium dodecyl sulfate (SDS) then spots to a ratio of 1 during scanning.
without SDS. The slides are finally dried by centrifugation.
It is important to scan the arrays within several hours after Image analysis To quantify the fluorescence of the features
hybridization because the fluorescence signal deteriorates via image analysis, pixels have to be assigned either to a
with time. spot or the background. This resulting boundary is often
As an alternative to this classical approach, automatic visualized in the software for acquiring raw values by a
array hybridization stations can be used (Wildsmith and circle surrounding the feature and is then called a feature
Elcock 2001). They provide hassle-free hybridization and indicator. In most cases, the image analysis software allows
washing of the slides by running programmed protocols. the placement of the feature indicators in a semiautomatic
The results do not depend on the ability of the researcher or automatic manner. Even if an algorithm places the
and are very reproducible. However, hybridization and feature indicators automatically, it is advisable to manually
washing conditions have to be fine-tuned in earlier experi- control this placement. In real life, the spots might be
ments to the probes and slide chemistry used. They are irregular, or fluorescent impurities on the chip surface may
therefore most adequate when a large number of arrays confuse algorithms. It is common to define all pixels inside
262 Appl Microbiol Biotechnol (2006) 73:255–273

a feature indicator as foreground and all adjacent pixels exports all values needed for the quality assessments
within a radius of three times the feature diameter as the described.
local background. The next step is the quantification of the
image data by calculating the arithmetic mean (Zhou et al. Data transformation The untransformed values of expres-
2000) or better median (Petrov et al. 2002) of the intensities sion ratios have the disadvantage of treating up- and
of the foreground and background pixels. This resulting downregulated genes differently. That means that a fourfold
data are stored in form of a table. Common spreadsheet upregulated gene has an expression ratio of 4, whereas a
programs and all sorts of commercial and free software fourfold downregulated gene has an expression ratio of
tools can be used for the next steps of data analysis 0.25. To circumvent this problem, the expression ratios are
(Brazma and Vilo 2001; Conway et al. 2002). often handled as their logarithms to the base 2. This results
in the values 2 and (−2) for a fourfold up- and down-
Background correction and filtering First, the so-called regulated gene, which has a number of practical advan-
background correction has to be made. That simply means tages. There is another important reason for transforming
subtracting the local background value from the foreground data to their logarithms: most statistical methods applied on
intensity (Benes and Muckenthaler 2003; Dharmadi and the data afterward expect normal distribution. Log-trans-
Gonzalez 2004). Additionally, an intensity-based filtering forming data is a mathematically simple strategy to achieve
of the data should be done to ensure the quality of the this (Dharmadi and Gonzalez 2004). Figure 3 shows two
signals and to prevent artifacts. The first and most common types of plots that are used to visualize microarray
important of these quality assessments is to exclude features data.
with intensity smaller than the background or assign them a
“floor” value, which is often the local or global back-
ground. The assignment of a floor value allows interpreta-
tion of genes that are transcribed at one condition but are Methods for normalization and testing for differential
totally switched off at the other condition. This is not gene expression
uncommon with bacteria where some operons are specifi-
cally induced by an inducer but are not transcribed in its Normalization Before it is possible to draw biological
absence. Because the data at the lower range of intensities conclusions or to apply sophisticated statistics, it is
tend to be much more variable, it is good practice to accept important to normalize the data. This corrects for systematic
only the intensity of features that lie significantly above biases resulting basically from different amounts of RNA
background and assign floor values to the rest. Significantly used for labeling, different incorporation efficiencies of the
above background means that they should have intensities Cy-3 and Cy-5 dyes in the labeling protocols, and different
that are more than one or two standard deviations above the detection efficiencies of the dyes (Yang et al. 2002a,b). A
local background. Two standard deviations above back- number of normalization methods have been proposed in
ground mean that they represent valid data with a the literature (Duggan et al. 1999; Kroll and Wolfl 2002;
confidence level of 95.5% (Quackenbush 2002). After Quackenbush 2002). The most widely used method is
passing this filtering, the ratio of means or the ratio of based on the assumption that the total sum of intensities
medians is calculated from the background-corrected should be equal in both channels, and therefore, the ratio
intensities in the “red” and “green” channels and results in between them should be one. A normalization factor is
the actual raw data from a DNA microarray experiment. A calculated from overall ratio, and ratios for all features are
further filtering that can be applied to verify quality and scaled accordingly. A somehow similar approach uses
prevent experimental artifacts would be the comparison of spotted chromosomal DNA for normalization, although this
the ratio of means, the ratio of medians, and the regression strategy is confined to microarrays made by spotting
ratio for each feature. The regression ratio is the linear double-stranded DNA (DeRisi et al. 1997). In addition, a
regression between the intensities of pixels within a circle number of other approaches are in use such as linear
of twice the diameter of the feature. The slope of the line of regression analysis and the intensity-dependent locally
best fit according to the least-square method is the weighted linear regression (LOWESS) normalization that
regression ratio. The distinct feature of this ratio is its corrects for intensity-dependent effects sometimes observed
independence of rigidly defining the background or in the data. A microbiologist has to be cautious with these
foreground pixels. Whenever the regression ratio, the ratio normalization strategies because they imply statistically that
of means, and the ratio of medians deviate too much from there must be a downregulated gene for every upregulated
each other, there is a problem with spot morphology or one. This assumption might be more appropriate from a
feature indicator placement. These data should be omitted statistical point of view with large eukaryotic genomes. It is
from further analysis. For example, the GenePix software not necessarily correct for bacteria because the number of
Appl Microbiol Biotechnol (2006) 73:255–273 263

a b

3
4
2

3 1

log2(R/G)
log10G

2
-1

-2
1

-3

1 2 3 4
log10R log10(R*G)
Fig. 3 Plots used to represent microarray experiments. a In a scatter logarithm of the ratio of the red and green intensity for each feature is
plot, each feature represents a point. The coordinates are the logarithm plotted as a function of the product of intensities. This plot easily
of the intensities in the red and green channel. In this plot, normalized reveals systematic intensity-dependent bias
data should map around the bisecting line. b In an R-I plot, the

genes is much smaller. Bacteria can regulate large groups of transcribed in vitro and added to both labeling mixes in
genes in a highly coordinated fashion, and the total amount known concentrations and ratios. These ratios can then be
of mRNA may be vastly different. An extreme example that used to calculate a normalization factor. The detection limit
is nevertheless common in microbiological research is the and the saturating concentration of the experiment can be
comparison of cells with a significantly different growth estimated from a row of increasing concentrations of
rate. In such cases, it is better to normalize by using controls. External controls with no complementary spiked-
“housekeeping genes”. This means that the normalization in RNA can be used as negative controls to judge the
factor is calculated from genes that do not change their stringency of the washing protocol.
expression level under the given experimental conditions
(DeRisi et al. 1996). Of course, one has to be careful in Testing for differential gene expression The following steps
selecting those genes especially with “nonstandard” organ- of transcription analysis highly depend on the question the
isms because this method requires some preceding exper- experiment addresses. Nevertheless, it is of interest for most
imental experience with gene expression in this organism. experiments performed which genes are differentially
The normalization factor is then calculated from a small set expressed. Although many strategies have been proposed,
of genes fulfilling this condition. If such knowledge is not the most widely used is to set a fixed fold-change cut off
available, an alternative for normalization is by using (Cui and Churchill 2003; Yang et al. 2002a,b). Usual values
external controls. This strategy is similar to the use of are two- to fivefold because DNA microarrays have been
housekeeping genes but does not correct for different shown to be reproducible at this level, especially when the
amounts of RNA in the target preparations. Therefore, it data are validated by repeated experiments (DeRisi et al.
is best combined with another normalization strategy. 1996). This approach seems straightforward and performs
Exogenous controls require spotting of control genes on well especially with the relative strong gene regulation in
the array that have no homology to genes of the genome microorganisms. To detect more subtle regulations, it is
under study and, therefore, will not exhibit cross-hybrid- better to calculate the mean and standard deviation for the
ization. These controls need to have similar general specific data set. Afterward, every data point can be
characteristics, that is, melting temperature, guanine–cyto- transformed to its Z score (Quackenbush 2002). This
sine (GC) content, and length compared with the rest of the number simply describes how many standard deviations it
probes on the chip. RNA complementary to those genes is is above or below the mean. By defining the cut off in
264 Appl Microbiol Biotechnol (2006) 73:255–273

terms of a Z score, the threshold is more specific to the A statistical estimation deduces that at least three
particular data set. DNA microarray data tend to be more replicates should be done (Lee et al. 2000). The ratios of
variable at lower than at higher intensity levels. Therefore, replicate experiments are averaged or better the geometric
it is better to define an intensity-dependent Z score mean is calculated (Quackenbush 2002).
threshold for the data set. For this, the mean and standard
deviation are calculated in a sliding window to establish a Design of experiments The experimental design is also
local intensity-dependent Z score for each data point (Yang critical (Churchill 2002; Kerr and Churchill 2001a,b;
et al. 2002a,b). Values with an associated Z score larger Nadon and Shoemaker 2002). For example, many genes
than 1.96 can be regarded as differentially expressed genes in a bacterial cell are growth-rate-dependent. Therefore, one
at a 95% confidence level (Quackenbush 2002). has to be very cautious to attribute the gene expression
measured on different growth substrates if the growth rates
of the cultures are too different. A possibility to circumvent
Design of experiments and data verification this problem is to work with cells grown in continuous
cultures where the growth rates can be adjusted to be
Replication Replication is crucial to achieve reliable data similar. However, this approach excludes the observation of
for microarrays (Spruill et al. 2002). There are three dynamic responses.
different kinds of replication that have to be distinguished. The common case in microarray research is to compare
Spotting the same probes multiple times on each array is transcription of two cell populations. Three groups of these
the first one. This provides some backup in case a spot two-condition experiments can be generally distinguished
cannot be evaluated due to technical artifacts, like dye (Conway and Schoolnik 2003): (1) differential response to
precipitations or dust particles, and allows the calculation of growth parameters. Examples for this would be studies of
the “on chip variance” (Worley et al. 2000). Moreover, it growing cells either in batch or in continuous culture on
improves the data quality by calculating the average various carbon sources or other varying growth conditions
expression ratios on the chip. The next important type of such as aerobic vs fermentative growth (DeRisi et al. 1997;
replication is to label and hybridize RNA that has been Oh et al. 2002; Pappas et al. 2004; Pashalidis et al. 2005;
prepared from one biological experiment several times. Paustian et al. 2002; Polen et al. 2003, 2005; Rossignol et
This corrects variance that results from differences in the al. 2003). (2) Treated vs untreated cultures. These experi-
labeling reactions. The most important replication in this ments monitor the response of a cell population to various
category is the so-called dye switch or dye swap. In a dye physiologic challenges such as, for example, heat shock or
switch, the RNA sample that was first Cy-3-labeled is antibiotic treatment. They define genes belonging to certain
Cy-5-labeled next and vice versa (Kerr and Churchill stimulons in the cells and provide a picture on how the cell
2001a,b; Tseng et al. 2001; Yang and Speed 2002). This copes with certain stress situations (Alsaker and Papoutsakis
is important because there are gene-specific dye effects. 2005; Anthony et al. 2005; Beckering et al. 2002; Chhabra et
Additionally, the intensity of Cy-3 and Cy-5 fluorescence is al. 2006; Gao et al. 2004; Mascher et al. 2003). (3) Wild-type
differing depending on the amount of dye that is bound to a vs mutant strains. This group summarizes studies where the
feature by hybridization (Rhodius et al. 2002). consequences of genetic mutations are monitored. They are
All replications mentioned so far are often called often employed to define regulons, but special care must be
technical replications. They are only used to correct taken to separate direct consequences of the mutation from
technical sources of error during the transcription analysis. indirect ones (Barbosa and Levy 2000; Cao et al. 2003; den
An additional “technical” source of variability is the RNA Hengst et al. 2005; Ogura et al. 2002; Salmon et al. 2005).
preparation. However, it should be kept in mind that the If more than two conditions are to be investigated, for
major varieties come from the biological experiment that example, time rows (Belland et al. 2003; Kucho et al.
has to be planned and conducted with great care. Micro- 2005), a single chip experiment is not enough, and several
arrays monitor the mRNA concentration of all genes of a hybridizations have to be combined (Yang and Speed
cell with considerable precision. Bacteria in turn can detect 2002). It is important to carefully plan this experiment to
environmental changes with extreme sensitivity, and generate meaningful data, detect possible biases, and
mRNA concentrations show a rapid change in response to avoid that the factors of interest are masked by the adding
them. Therefore, the major task when trying to get errors. Figure 4 shows examples of such experimental
reproducible microarrays is to perform highly reproducible designs. The common reference design, as shown in
physiological or genetic experiments. Good array data Fig. 4a, is the predominant case (Conway and Schoolnik
depend as much upon good microbial physiology technique 2003). A problem with using references is that many
as they do on good DNA array technique (Conway and genes in bacteria are completely switched off without
Schoolnik 2003). their inductor. Therefore, a number of them will therefore
Appl Microbiol Biotechnol (2006) 73:255–273 265

a genomes and identify microorganisms. These fields of


sample 1 sample 2 sample 3 sample 4 application require the labeling of chromosomal DNA that
is hybridized with the array. For the comparison of
genomes, also called genomotyping, the chromosomal
DNA from the bacterium that has to be typed is labeled.
This labeled DNA is hybridized with a whole genome array
reference
from a reference strain. The chromosomal DNA from the
reference strain is labeled with a second dye and included
in the hybridization. If genes are present in both strains, the
b corresponding probe will yield a signal in both channels,
sample 1
whereas when the gene is absent in the typed strain, the
signal in one channel is missing. Threshold values are
defined on the basis of the data set to decide whether a gene
sample 2
is absent. This decision is often done with the GACK
sample 5
software (Kim et al. 2002; Stabler et al. 2005). The
approach works only with closely related organisms and
has the inherent limitation that is only possible to decide
which genes are missing from the typed strain as compared
with the reference strain, not which ones the typed strains
sample 4 sample 3
has more than the reference strain (Coenye et al. 2005;
Lindroos et al. 2005; Molenaar et al. 2005; Paustian et al.
c 2005; Reen et al. 2005).
sample 1 For the identification of microorganisms, probes specific
to certain taxons are spotted on arrays. Mostly, 16S and 23S
ribosomal RNA genes are used for this purpose. The labeled
chromosomal DNA of the organism under investigation is
sample 5 sample 2 hybridized with this chip in a one-channel experiment. By
sophisticated design of these probes, it is possible to classify
the organism depending on the probes that hybridize with
the target (Belosludtsev et al. 2004; Lehner et al. 2005; Loy
et al. 2002, 2005; Mitterer et al. 2004).
sample 4 sample 3
Data verification Microarray data can easily contain errors
Fig. 4 Basic types of experimental design schemes with multiple originating from probe interchange, array production,
samples. Each box represents one sample, and each arrow points from
the green-labeled sample to the red-labeled sample. a Reference labeling reactions, hybridization, and data acquisition.
design; b loop design; c all-pair design Therefore, it is crucially advisable to validate data of the
most important genes with independent methods to quantify
not be expressed under the condition the reference was mRNA. Real-time RT-PCR (Gibson et al. 1996; Heid et al.
made. If any of those genes are expressed under other 1996; Helmann et al. 2001; Wurmbach et al. 2003) or
conditions, this will lead to infinite induction ratios for Northern blotting (Heller et al. 1997; Schuchhardt et al.
those genes. Some studies suggest using a mixture of 2000) are the most common options. In prokaryotic
reference RNAs obtained from several sampling condi- organisms, the operon structure can also give some hints:
tions (Kucho et al. 2005; Laub et al. 2000), labeled operons are expressed as coordinated and often show a
oligonucleotides complementary to each probe (Dudley et polar effect in the direction of transcription (Pappas et al.
al. 2002), or to use labeled chromosomal DNA as 2004). However, more indirect methods, like proteomics
reference (Belland et al. 2003). Another strategy would data, lacZ fusions, or enzyme activities, can also be used to
be to define a base value in case there is a strong signal back up transcription data (Rhodius et al. 2002). Moreover,
on one channel but no signal for that feature on the other they proof the relevance of the transcription data for these
to be able to calculate a ratio. levels of cellular physiology. The best option to verify
microarray expression data is real-time RT–PCR, also
Other experimental approaches with microarrays Beside called QRT-PCR (quantitative RT-PCR). This PCR method
transcription analysis and many other minor applications, allows to quantify dozens of samples simultaneously. It has
microarrays are routinely used in microbiology to compare been shown that in most cases, the data from microarrays
266 Appl Microbiol Biotechnol (2006) 73:255–273

and real-time RT-PCR are consistent (Mutch et al. 2002). Answers to these questions can be obtained by applying
However, in the majority of studies, microarray data clustering algorithms (Claverie 1999; de Hoon et al. 2004;
compress the fold changes of expression as compared with Eisen et al. 1998; Michaels et al. 1998; Sherlock 2000;
real-time RT-PCR by two- to tenfold (Conway and Schoolnik Sturn et al. 2002). Typical software packages used for this
2003). This has been attributed to the smaller dynamic range step are GeneSpring (Silicon Genetics, San Carlos, CA,
of microarrays (Holland 2002; Pappas et al. 2004; Yuen et al. USA) or SpotFire Array Explorer (SpotFire, Inc., Cam-
2002). Northern blotting is another option for data valida- bridge, MA, USA). The interpretation of transcription data
tion, but it is only applicable for much fewer sample in the context of known functions, for example, biochem-
numbers and is less quantitative. If a larger number of ical pathways or a working model, is a hypothesis-based
samples are to be checked, RNA dot blot analysis has been approach. In contrast to this, a purely statistical analysis,
used to verify array data (Moore et al. 2005). like clustering, can be employed without any prior
hypothesis and might, at least in theory, lead to unexpected
Data storage Microarrays produce a vast amount of data. It is conclusions. This “unsupervised analysis” is seen as a
important to organize and store these data in databases (Bassett major advantage of the functional genomics approach.
et al. 1999; Brazma et al. 2002; Sherlock and Ball 2005). This However, it has to be stressed that clustering of data will
is true for the work in the laboratory and for the deposition of always result in some clusters regardless of biological
the data in public databases. Many journals require the relevance (Clare and King 2002). This is a common
deposition of the microarray data in public databases, like characteristic of bioinformatics data. It might suggest
“National Center for Biotechnology Information (NCBI) Gene totally unexpected coherences but does not prove anything
Expression Omnibus” (GEO) (Edgar et al. 2002), “KEGG per se without returning to the laboratory and doing
EXPRESSION” database (Kanehisa et al. 2002), “ArrayEx- classical experiments to find supporting evidences.
press Database” (Brazma et al. 2003), or “Stanford Microarray Clustering is applied to the averaged expression data from
Database” (Ball et al. 2005) and the submission of the several microarray experiments where quality assessment,
assigned accession number prior to publication (Brazma et al. normalization, and technical controls have already been
2000). The minimum information about a microarray exper- done. It can be applied to several distinct experiments or a
iment (MIAME) specification was created to achieve accurate time series. It is possible to cluster genes in groups according
and consistent annotation of microarray experiments (Ball et to the similarity of the expression in several experiments or
al. 2002; Brazma 2001; Brazma et al. 2001). This specifica- to cluster the experiments to groups of similar gene
tion tries to define a framework to describe the minimal data expression. When these two clustering directions are
set required for a microarray experiment. It comprise data on combined, they are referred to as biclustering or two-way
the experimental design, information on the array design, the clustering (Cheng and Church 2000; Getz et al. 2000).
samples used, the RNA extraction and labeling, hybridization To simply cluster the genes according to their expression
procedures and parameters, experimental data, and finally, a in several experiments, the expression level of each gene is
detailed description of strategy and controls used for represented as a point in an n-dimensional space. n is equal
normalization. The experimental data comprises image data, to the number of experiments or time points. This can easily
raw data, data after normalization, and after averaging of be visualized with data from two experiments because the
replicates. For describing all these data in a structured way, a points (genes) can be represented by a two-dimensional
special XML format called microarray gene expression coordinate system as shown in Fig. 5. Clusters can, under
markup language (MAGE-ML) has been proposed (Spellman these circumstances, be visually recognized as points in
et al. 2002). The databases also offer online tools for data closer vicinity.
submission, like MIAMExpress (Brazma et al. 2003). Before any mathematical clusters analysis can be done,
three things need to be defined:
– First, a distance measure between data points has to be
Cluster analysis of expression data selected. Most often, this is the Euclidian distance
between points, but more sophisticated definitions are
The underlying principle of applying clustering to expres- possible and result in different clusters. Therefore, this
sion data is the assumption that similar expression levels can be as important as the selection of the clustering
might indicate related biological function (Brazma and Vilo algorithm. Some sort of “feeling” for the data set and
2001). Therefore, insight in the function of unknown genes trial and error is used for the choice of the right one.
may be gained by observing whether they are coregulated – Second, a function that defines the quality of clustering
with known genes or whether the genes are expressed or results must be chosen. Most obviously, the defined
repressed as a group in response to a defined stimulus. distance measurement is used to minimize the distance
Appl Microbiol Biotechnol (2006) 73:255–273 267

Fig. 5 Schematic illustrating


how clustering algorithms work.
a
In this example, data from only
two experiments result in a two-
dimensional data set. a Hierar-

Experiment 2
chical data clustering produces a
tree structure. b In K-means
clustering, centroids, drawn as
stars, are dispersed by the user,
and data points are assigned to
the clusters in an iterative algo-
rithm. c Self-organizing maps
start with a regular grid of
centroids, represented as stars.
Pulling the centroids to the
centers of the clusters they rep-
resent in an iterative algorithm Experiment 1
b
identifies the clusters

Experiment 2
Experiment 2

Experiment 1 Experiment 1
c
Experiment 2
Experiment 2

Experiment 1 Experiment 1

of each point in a cluster to the center of the cluster. the user needs to make assumptions on the clusters a
But again, other methods are possible and necessary, priori and specify the number of clusters to be found.
depending on how noisy the data are.
– Finally, the algorithm for clustering needs to be selected.
These algorithms try to find the best possible clustering Hierarchical clustering Hierarchical clustering as illustrated
results using the function that defines the quality of in Fig. 5a is a widely used method that does not need a
clustering. There is a vast collection of clustering priori information and results in a tree structure of
algorithms described in the literature. The major ones increasing similarity (Khan et al. 1998; Lashkari et al.
can be separated into two groups depending on whether 1997; Schena et al. 1996). Every tree node represents a
268 Appl Microbiol Biotechnol (2006) 73:255–273

cluster at some resolution. The size of the resulting clusters and Engelmann 2000). The transcriptome data are usually
can therefore be defined by setting a threshold of a certain more comprehensive because of the limited number of
intercluster distance. The hierarchical clustering algorithms proteins that can be resolved in two-dimensional gels.
work either “top-down,” by starting with all genes in a Moreover, the relatively detailed knowledge on the genetics
single cluster and separating them based on a criterion of and biochemistry of prokaryotic organisms allows direct
dissimilarity, or “bottom-up,” by starting with every interpretation of the transcriptome data in active pathways.
individual gene in a single cluster and merging them Many studies have shown that enzyme levels correlate with
consecutively based on a criterion of similarity. After their respective gene expression profiles (Arfin et al. 2000;
hierarchical clustering, trees and color maps are the most Smulski et al. 2001; Tao et al. 2001). Future investigations
natural representation of results (Wen et al. 1998). will show how far this will extend to metabolic flux data
Hierarchical clustering gives good results with a clean data (Kromer et al. 2004; Oh and Liao 2000). The integration of
set but is very sensitive to noisy data. transcription analysis with comparative genome sequenc-
The two other clustering algorithms mentioned here ing, proteomics, metabolic flux analysis, and computer
require an initial guess on the number of suspected clusters. modeling of the cell physiology will be an important data
source for system biology, which represents a new
K-means clustering K-means clustering, an alternative to approach for a global quantitative picture of cell physiology
hierarchical clustering, is argued to give good results when (Galitski 2004). The system biology approach requires a
compact clusters are expected. As shown in Fig. 5b, the user fundamental framework involving several distinct steps: (1)
disperses so-called centroids in the data space. The iterative definition of all components of a system; (2) systematic
algorithm assigns each data point to the cluster with the perturbation and monitoring of the components either
smallest distance to the next centroid. It then calculates a genetically or by modification of the environment; (3)
new one from the points that belong to the cluster and reconcile the experimentally observed responses with those
replaces the old centroid. The computation process is predicted by a quantitative model; and (4) design and
terminated when there is no further change in the assignment perform new perturbations to distinguish between multiple
of gene points to the centroids (Lu et al. 2004; Xu 2004). or competing model hypothesis (Ideker et al. 2001).
Transcription analysis is the most important method to
Self-organizing maps Self-organizing maps (SOMs) are monitor the mRNA abundance of each gene. It will be
clustering algorithms that are related to K-means clustering complemented by proteomics and metabolomics to provide
but have been found to be superior in both robustness and the experimental data for model verification as required in
accuracy when analyzing microarray data (Alsaker and steps 2 and 4 of the outlined strategy (Boyce et al. 2004).
Papoutsakis 2005; Garge et al. 2005; Tamayo et al. 1999). Because system biology will form the foundation of
The algorithm, illustrated in Fig. 5c, is complex and starts metabolic engineering, transcription analysis will also be
with centroids dispersed as a regular grid among the data. important in this area (Burja et al. 2003; Vemuri and
Each data point then pulls the nearest centroid in little steps. Aristidou 2005).
The extent of pulling depends on the distance to the Microarrays can be used in microbiology for a multitude
centroid. Centroids that come close to each other merge, of differing applications, from the study of gene regulation
and centroids with no movement will be deleted. When the and bacterial response to environmental changes, genome
remaining centroids are located in the center of the clusters, organization, and evolutionary questions up to taxonomic
the computation stops (Xu 2004). and environmental studies. The knowledge of the main
aspects of this technology helps to understand these specific
applications.
Conclusions Vital for further advances of microarray technology in
microbiology will be the recognition of the importance of
Because of falling prices for equipment and oligonucleo- the physiological experiments ahead of the transcription
tides, DNA microarrays are on their way to become analysis, the standardization of protocols and controls for
common tools in the microbiological laboratory. They transcription analysis (Benes and Muckenthaler 2003),
allow a dynamic view on the physiology of the living cell more integration of the data analysis with biochemical and
and have been compared with a kind of “microscope” genetic knowledge, and flexible and intuitive databases for
(Brown and Botstein 1999; Ferea and Brown 1999). An mining the vast amounts of data (Mlecnik et al. 2005).
inherent limitation of microarrays is that the resulting
Acknowledgments The author wants to thank Profs. G. Gottschalk,
transcriptome does not account for posttranslational events.
W. Liebl, and B. Bowien for their support. He is also grateful to Drs. T.
However, in most cases, there is a high correlation between Mascher, P. Ehrenreich, and B. Veith for critically reading the
transcriptome and proteome (Akutsu et al. 2000; Hecker manuscript. The microarray core facility in the institute of microbiology
Appl Microbiol Biotechnol (2006) 73:255–273 269

and genetics is part of the competence network Göttingen “Genome Powdrill TF (2004) Organism identification using a genome
Research on Bacteria” funded by the German Federal Ministry of sequence-independent universal microarray probe set. Biotechniques
Education and Research (BMBF). 37(4):654–658, 660
Benes V, Muckenthaler M (2003) Standardization of protocols in
cDNA microarray analysis. Trends Biochem Sci 28(5):244–249
Bernstein JA, Khodursky AB, Lin PH, Lin-Chao S, Cohen SN (2002)
References Global analysis of mRNA decay and abundance in Escherichia
coli at single-gene resolution using two-color fluorescent DNA
Aboytes K, Humphreys J, Reis S, Ward B (2003) Slide coating and microarrays. Proc Natl Acad Sci USA 99(15):9697–9702
DNA immobilization chemistries. In: Blalock E (ed) A beginner's Borucki MK, Krug MJ, Muraoka WT, Call DR (2003) Discrimination
guide to microarrays. Kluwer, Boston, pp 1–41 among Listeria monocytogenes isolates using a mixed genome
Akutsu T, Miyano S, Kuhara S (2000) Inferring qualitative relations in DNA microarray. Vet Microbiol 92(4):351–362
genetic networks and metabolic pathways. Bioinformatics 16 Bowtell DD (1999) Options available—from start to finish—for obtaining
(8):727–734 expression data by microarray. Nat Genet 21(Suppl 1):25–32
Alsaker KV, Papoutsakis ET (2005) Transcriptional program of early Boyce JD, Cullen PA, Adler B (2004) Genomic-scale analysis of
sporulation and stationary-phase events in Clostridium acetobu- bacterial gene and protein expression in the host. Emerg Infect
tylicum. J Bacteriol 187(20):7103–7118 Dis 10(8):1357–1362
Amon P, Ivanov I (2003) Genomic DNA labeling for hybridization Brazma A (2001) On the importance of standardisation in life
with DNA arrays. Biotechniques 34(4):700–702, 704 sciences. Bioinformatics 17(2):113–114
Anthony JR, Warczak KL, Donohue TJ (2005) A transcriptional Brazma A, Vilo J (2001) Gene expression data analysis. Microbes
response to singlet oxygen, a toxic byproduct of photosynthesis. Infect 3(10):823–829
Proc Natl Acad Sci USA 102(18):6502–6507 Brazma A, Robinson A, Cameron G, Ashburner M (2000) One-stop
Arfin SM, Long AD, Ito ET, Tolleri L, Riehle MM, Paegle ES, Hatfield shop for microarray data. Nature 403(6771):699–700
GW (2000) Global gene expression profiling in Escherichia coli Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P,
K12. The effects of integration host factor. J Biol Chem 275 Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC,
(38):29672–29684 Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V,
Badiee A, Eiken HG, Steen VM, Lovlie R (2003) Evaluation of five Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer
different cDNA labeling methods for microarrays using spike S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimum
controls. BMC Biotechnol 3:23 information about a microarray experiment (MIAME)—toward
Ball CA, Sherlock G, Parkinson H, Rocca-Sera P, Brooksbank C, standards for microarray data. Nat Genet 29(4):365–371
Causton HC, Cavalieri D, Gaasterland T, Hingamp P, Holstege F, Brazma A, Sarkans U, Robinson A, Vilo J, Vingron M, Hoheisel J,
Ringwald M, Spellman P, Stoeckert CJ Jr, Stewart JE, Taylor R, Fellenberg K (2002) Microarray data representation, annotation
Brazma A, Quackenbush J (2002) Standards for microarray data. and storage. Adv Biochem Eng Biotechnol 77:113–139
Science 298(5593):539 Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J,
Ball CA, Awad IA, Demeter J, Gollub J, Hebert JM, Hernandez- Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren
Boussard T, Jin H, Matese JC, Nitzberg M, Wymore F, P, Lara GG, Oezcimen A, Rocca-Serra P, Sansone SA (2003)
Zachariah ZK, Brown PO, Sherlock G (2005) The Stanford ArrayExpress—a public repository for microarray gene expres-
Microarray Database accommodates additional microarray plat- sion data at the EBI. Nucleic Acids Res 31(1):68–71
forms and data formats. Nucleic Acids Res 33(Database issue): Brown PO, Botstein D (1999) Exploring the new world of the genome
D580–D582 with DNA microarrays. Nat Genet 21(Suppl 1):33–37
Baracchini E, Bremer H (1987) Determination of synthesis rate and Burja AM, Dhamwichukorn S, Wright PC (2003) Cyanobacterial
lifetime of bacterial mRNAs. Anal Biochem 167(2):245–260 postgenomic research and systems biology. Trends Biotechnol 21
Barbosa TM, Levy SB (2000) Differential expression of over 60 (11):504–511
chromosomal genes in Escherichia coli by constitutive expres- Calevro F, Charles H, Reymond N, Dugas V, Cloarec JP, Bernillon J,
sion of MarA. J Bacteriol 182(12):3467–3474 Rahbe Y, Febvay G, Fayard JM (2004) Assessment of 35mer
Barczak A, Rodriguez MW, Hanspers K, Koth LL, Tai YC, Bolstad amino-modified oligonucleotide based microarray with bacterial
BM, Speed TP, Erle DJ (2003) Spotted long oligonucleotide samples. J Microbiol Methods 57(2):207–218
arrays for human gene expression analysis. Genome Res 13 Cao M, Salzberg L, Tsai CS, Mascher T, Bonilla C, Wang T, Ye RW,
(7):1775–1785 Marquez-Magana L, Helmann JD (2003) Regulation of the
Bassett DE Jr, Eisen MB, Boguski MS (1999) Gene expression Bacillus subtilis extracytoplasmic function protein sigma(Y) and
informatics—it's all in your mine. Nat Genet 21(Suppl 1):51–55 its target promoters. J Bacteriol 185(16):4883–4890
Bates SR, Baldwin DA, Channing A, Gifford LK, Hsu A, Lu P (2005) Chan K, Baker S, Kim CC, Detweiler CS, Dougan G, Falkow S
Cooperativity of paired oligonucleotide probes for microarray (2003) Genomic comparison of Salmonella enterica serovars and
hybridization assays. Anal Biochem 342(1):59–68 Salmonella bongori by use of an S. enterica serovar typhimurium
Beckering CL, Steil L, Weber MH, Volker U, Marahiel MA (2002) DNA microarray. J Bacteriol 185(2):553–563
Genomewide transcriptional analysis of the cold shock response Cheng Y, Church GM (2000) Biclustering of expression data. Proc Int
in Bacillus subtilis. J Bacteriol 184(22):6395–6402 Conf Intell Syst Mol Biol 8:93–103
Beier M, Hoheisel JD (1999) Versatile derivatisation of solid support Cheung VG, Morley M, Aguilar F, Massimi A, Kucherlapati R, Childs
media for covalent bonding on DNA-microchips. Nucleic Acids G (1999) Making and reading microarrays. Nat Genet 21(Suppl
Res 27(9):1970–1977 1):15–19
Belland RJ, Zhong G, Crane DD, Hogan D, Sturdevant D, Sharma J, Chhabra SR, He Q, Huang KH, Gaucher SP, Alm EJ, He Z, Hadi MZ,
Beatty WL, Caldwell HD (2003) Genomic transcriptional Hazen TC, Wall JD, Zhou J, Arkin AP, Singh AK (2006) Global
profiling of the developmental cycle of Chlamydia trachomatis. analysis of heat shock response in Desulfovibrio vulgaris
Proc Natl Acad Sci USA 100(14):8478–8483 Hildenborough. J Bacteriol 188(5):1817–1828
Belosludtsev YY, Bowerman D, Weil R, Marthandan N, Balog R, Luebke Churchill GA (2002) Fundamentals of experimental design for cDNA
K, Lawson J, Johnston SA, Lyons CR, Obrien K, Garner HR, microarrays. Nat Genet 32(Suppl):490–495
270 Appl Microbiol Biotechnol (2006) 73:255–273

Clare A, King RD (2002) How well do we understand the clusters Gibson UE, Heid CA, Williams PM (1996) A novel method for real
found in microarray data? In Silico Biol 2(4):511–522 time quantitative RT-PCR. Genome Res 6(10):995–1001
Claverie JM (1999) Computational methods for the identification of Granjeaud S, Bertucci F, Jordan BR (1999) Expression profiling:
differential and coordinated gene expression. Hum Mol Genet 8 DNA arrays in many guises. Bioessays 21(9):781–790
(10):1821–1832 Hecker M, Engelmann S (2000) Proteomics, DNA arrays and the
Coenye T, Gevers D, Van de Peer Y, Vandamme P, Swings J (2005) analysis of still unknown regulons and unknown proteins of
Towards a prokaryotic genomic taxonomy. FEMS Microbiol Rev Bacillus subtilis and pathogenic gram-positive bacteria. Int J Med
29(2):147–167 Microbiol 290(2):123–134
Conway T, Schoolnik GK (2003) Microarray expression profiling: Hegde P, Qi R, Abernathy K, Gay C, Dharap S, Gaspard R, Hughes
capturing a genome-wide portrait of the transcriptome. Mol JE, Snesrud E, Lee N, Quackenbush J (2000) A concise guide to
Microbiol 47(4):879–889 cDNA microarray analysis. Biotechniques 29(3):548–556
Conway T, Kraus B, Tucker DL, Smalley DJ, Dorman AF, McKibben Heid CA, Stevens J, Livak KJ, Williams PM (1996) Real time
L (2002) DNA array analysis in a Microsoft Windows environ- quantitative PCR. Genome Res 6(10):986–994
ment. Biotechniques 32(1):110, 112–114, 116, 118–119 Heller RA, Schena M, Chai A, Shalon D, Bedilion T, Gilmore J,
Cui X, Churchill GA (2003) Statistical tests for differential expression Woolley DE, Davis RW (1997) Discovery and analysis of
in cDNA microarray experiments. Genome Biol 4(4):210 inflammatory disease-related genes using cDNA microarrays.
de Hoon MJ, Imoto S, Nolan J, Miyano S (2004) Open source Proc Natl Acad Sci USA 94(6):2150–2155
clustering software. Bioinformatics 20(9):1453–1454 Helmann JD, Wu MF, Kobel PA, Gamo FJ, Wilson M, Morshedi
den Hengst CD, van Hijum SA, Geurts JM, Nauta A, Kok J, Kuipers MM, Navre M, Paddon C (2001) Global transcriptional
OP (2005) The Lactococcus lactis CodY regulon: identification response of Bacillus subtilis to heat shock. J Bacteriol 183
of a conserved cis-regulatory element. J Biol Chem 280 (24):7318–7328
(40):34332–34342 Herold KE, Rasooly A (2003) Oligo design: a computer program
DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, for development of probes for oligonucleotide microarrays.
Chen Y, Su YA, Trent JM (1996) Use of a cDNA microarray to Biotechniques 35(6):1216–1221
analyse gene expression patterns in human cancer. Nat Genet 14 Hessner MJ, Singh VK, Wang X, Khan S, Tschannen MR, Zahrt TC
(4):457–460 (2004) Utilization of a labeled tracking oligonucleotide for
DeRisi JL, Iyer VR, Brown PO (1997) Exploring the metabolic and visualization and quality control of spotted 70-mer arrays. BMC
genetic control of gene expression on a genomic scale. Science Genomics 5(1):12
278(5338):680–686 Holland MJ (2002) Transcript abundance in yeast varies over six
Dharmadi Y, Gonzalez R (2004) DNA microarrays: experimental orders of magnitude. J Biol Chem 277(17):14363–14366
issues, data analysis, and application to bacterial systems. Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ,
Biotechnol Prog 20(5):1309–1324 Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer
Dudley AM, Aach J, Steffen MA, Church GM (2002) Measuring MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniants
absolute expression with microarrays with a calibrated reference SB, Cavet G, Walker WL, West A, Coffey E, Shoemaker
sample and an extended signal intensity range. Proc Natl Acad DD, Stoughton R, Blanchard AP, Friend SH, Linsley PS
Sci USA 99(11):7554–7559 (2001) Expression profiling using microarrays fabricated by
Duggan DJ, Bittner M, Chen Y, Meltzer P, Trent JM (1999) Expression an ink-jet oligonucleotide synthesizer. Nat Biotechnol 19
profiling using cDNA microarrays. Nat Genet 21(Suppl 1):10–14 (4):342–347
Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: Ideker T, Galitski T, Hood L (2001) A new approach to decoding
NCBI gene expression and hybridization array data repository. life: systems biology. Annu Rev Genomics Hum Genet
Nucleic Acids Res 30(1):207–210 2:343–372
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster Kafatos FC, Jones CW, Efstratiadis A (1979) Determination of nucleic
analysis and display of genome-wide expression patterns. Proc acid sequence homologies and relative concentrations by a dot
Natl Acad Sci USA 95(25):14863–14868 hybridization procedure. Nucleic Acids Res 7(6):1541–1552
Emrich SJ, Lowe M, Delcher AL (2003) PROBEmer: a Web-based Kane MD, Jatkoe TA, Stumpf CR, Lu J, Thomas JD, Madore SJ
software tool for selecting optimal DNA oligos. Nucleic Acids (2000) Assessment of the sensitivity and specificity of
Res 31(13):3746–3750 oligonucleotide (50mer) microarrays. Nucleic Acids Res 28
Ferea TL, Brown PO (1999) Observing the living genome. Curr Opin (22):4552–4557
Genet Dev 9(6):715–722 Kanehisa M, Goto S, Kawashima S, Nakaya A (2002) The KEGG
Fodor SP, Read JL, Pirrung MC, Stryer L, Lu AT, Solas D (1991) databases at GenomeNet. Nucleic Acids Res 30(1):42–46
Light-directed, spatially addressable parallel chemical synthesis. Kerr MK, Churchill GA (2001a) Experimental design for gene
Science 251(4995):767–773 expression microarrays. Biostatistics 2(2):183–201
Galitski T (2004) Molecular networks in model systems. Annu Rev Kerr MK, Churchill GA (2001b) Statistical design and the analysis of
Genomics Hum Genet 5:177–187 gene expression microarray data. Genet Res 77(2):123–128
Gao H, Wang Y, Liu X, Yan T, Wu L, Alm E, Arkin A, Thompson Khan J, Simon R, Bittner M, Chen Y, Leighton SB, Pohida T, Smith
DK, Zhou J (2004) Global transcriptome analysis of the heat PD, Jiang Y, Gooden GC, Trent JM, Meltzer PS (1998) Gene
shock response of Shewanella oneidensis. J Bacteriol 186 expression profiling of alveolar rhabdomyosarcoma with cDNA
(22):7796–7803 microarrays. Cancer Res 58(22):5009–5013
Garge NR, Page GP, Sprague AP, Gorman BS, Allison DB (2005) Khodursky AB, Bernstein JA, Peter BJ, Rhodius V, Wendisch VF,
Reproducible clusters from microarray research: whither? BMC Zimmer DP (2003) Escherichia coli spotted double-strand DNA
Bioinformatics 6(Suppl 2):S10 microarrays. In: Brownstein MJ, Khodursky AB (eds) Functional
Getz G, Levine E, Domany E (2000) Coupled two-way clustering genomics. Humana, Totowa, pp 61–78
analysis of gene microarray data. Proc Natl Acad Sci USA 97 Kim CC, Joyce EA, Chan K, Falkow S (2002) Improved analytical
(22):12079–12084 methods for microarray-based genome-composition analysis.
Ghosh SS, Musso GF (1987) Covalent attachment of oligonucleotides Genome Biol 3(11):RESEARCH0065
to solid supports. Nucleic Acids Res 15(13):5353–5372 Knight J (2001) When the chips are down. Nature 410(6831):860–861
Appl Microbiol Biotechnol (2006) 73:255–273 271

Kroll TC, Wolfl S (2002) Ranking: a closer look on globalisation Mitterer G, Huber M, Leidinger E, Kirisits C, Lubitz W, Mueller
methods for normalisation of gene expression arrays. Nucleic MW, Schmidt WM (2004) Microarray-based identification of
Acids Res 30(11):e50 bacteria in clinical samples by solid-phase PCR amplification
Kromer JO, Sorgenfrei O, Klopprogge K, Heinzle E, Wittmann C of 23S ribosomal DNA sequences. J Clin Microbiol 42(3):
(2004) In-depth profiling of lysine-producing Corynebacterium 1048–1057
glutamicum by combined analysis of the transcriptome, metab- Mlecnik B, Scheideler M, Hackl H, Hartler J, Sanchez-Cabo F,
olome, and fluxome. J Bacteriol 186(6):1769–1784 Trajanoski Z (2005) PathwayExplorer: Web service for visualiz-
Kucho K, Okamoto K, Tsuchiya Y, Nomura S, Nango M, Kanehisa ing high-throughput expression data on biological pathways.
M, Ishiura M (2005) Global analysis of circadian expression in Nucleic Acids Res 33(Web Server issue):W633–W637
the cyanobacterium Synechocystis sp. strain PCC 6803. Molenaar D, Bringel F, Schuren FH, de Vos WM, Siezen RJ,
J Bacteriol 187(6):2190–2199 Kleerebezem M (2005) Exploring Lactobacillus plantarum
Kushner SR (1996) mRNA decay. In: Neidhard FC, Curtiss R, genome diversity by using microarrays. J Bacteriol 187
Ingraham JL et al (eds) Escherichia coli and Salmonella: cellular (17):6119–6127
and molecular biology. ASM, Washington, DC, pp 849–860 Moore CM, Gaballa A, Hui M, Ye RW, Helmann JD (2005) Genetic
Kushner SR (2002) mRNA decay in Escherichia coli comes of age. and physiological responses of Bacillus subtilis to metal ion
J Bacteriol 184(17):4658–4665; discussion 4657 stress. Mol Microbiol 57(1):27–40
Lander ES (1999) Array of hope. Nat Genet 21(Suppl 1):3–4 Mutch DM, Berger A, Mansourian R, Rytz A, Roberts MA (2002)
Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C, The limit fold change model: a practical approach for selecting
Hwang SY, Brown PO, Davis RW (1997) Yeast microarrays for differentially expressed genes from microarray data. BMC
genome wide parallel genetic and gene expression analysis. Proc Bioinformatics 3:17
Natl Acad Sci USA 94(24):13057–13062 Nadon R, Shoemaker J (2002) Statistical issues with microarrays:
Laub MT, McAdams HH, Feldblyum T, Fraser CM, Shapiro L (2000) processing and analysis. Trends Genet 18(5):265–271
Global analysis of the genetic network controlling a bacterial cell Neidhard FC, Ingraham JL, Schaechter M (1990) Physiology of the
cycle. Science 290(5499):2144–2148 bacterial cell. Sinauer, Sunderland
Lee ML, Kuo FC, Whitmore GA, Sklar J (2000) Importance of Nguyen C, Rocha D, Granjeaud S, Baldit M, Bernard K, Naquet P,
replication in microarray gene expression studies: statistical Jordan BR (1995) Differential gene expression in the murine
methods and evidence from repetitive cDNA hybridizations. thymus assayed by quantitative hybridization of arrayed cDNA
Proc Natl Acad Sci USA 97(18):9834–9839 clones. Genomics 29(1):207–216
Lehner A, Loy A, Behr T, Gaenge H, Ludwig W, Wagner M, Schleifer Ogura M, Yamaguchi H, Kobayashi K, Ogasawara N, Fujita Y,
KH (2005) Oligonucleotide microarray for identification of Tanaka T (2002) Whole-genome analysis of genes regulated by
Enterococcus species. FEMS Microbiol Lett 246(1):133–142 the Bacillus subtilis competence transcription factor ComK.
Lemmo AV, Rose DJ, Tisone TC (1998) Inkjet dispensing technol- J Bacteriol 184(9):2344–2351
ogy: applications in drug discovery. Curr Opin Biotechnol 9 Oh MK, Liao JC (2000) Gene expression profiling by DNA micro-
(6):615–617 arrays and metabolic fluxes in Escherichia coli. Biotechnol Prog
Lindroos HL, Mira A, Repsilber D, Vinnere O, Naslund K, Dehio 16(2):278–286
M, Dehio C, Andersson SG (2005) Characterization of the Oh MK, Rohlin L, Kao KC, Liao JC (2002) Global expression
genome composition of Bartonella koehlerae by microarray profiling of acetate-grown Escherichia coli. J Biol Chem 277
comparative genomic hybridization profiling. J Bacteriol 187 (15):13175–13183
(17):6155–6165 Okamoto T, Suzuki T, Yamamoto N (2000) Microarray fabrication
Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ (1999) High with covalent attachment of DNA using bubble jet technology.
density synthetic oligonucleotide arrays. Nat Genet 21(Suppl Nat Biotechnol 18(4):438–441
1):20–24 Pappas CT, Sram J, Moskvin OV, Ivanov PS, Mackenzie RC,
Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Choudhary M, Land ML, Larimer FW, Kaplan S, Gomelsky M
Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL (2004) Construction and validation of the Rhodobacter sphaer-
(1996) Expression monitoring by hybridization to high-density oides 2.4.1 DNA microarray: transcriptome flexibility at diverse
oligonucleotide arrays. Nat Biotechnol 14(13):1675–1680 growth modes. J Bacteriol 186(14):4748–4758
Loy A, Lehner A, Lee N, Adamczyk J, Meier H, Ernst J, Schleifer Pashalidis S, Moreira LM, Zaini PA, Campanharo JC, Alves LM,
KH, Wagner M (2002) Oligonucleotide microarray for 16S Ciapina LP, Vencio RZ, Lemos EG, Da Silva AM, Da Silva
rRNA gene-based detection of all recognized lineages of AC (2005) Whole-genome expression profiling of Xylella
sulfate-reducing prokaryotes in the environment. Appl Environ fastidiosa in response to growth on glucose. OMICS 9(1):
Microbiol 68(10):5064–5081 77–90
Loy A, Schulz C, Lucker S, Schopfer-Wendels A, Stoecker K, Paustian ML, May BJ, Cao D, Boley D, Kapur V (2002) Transcrip-
Baranyi C, Lehner A, Wagner M (2005) 16S rRNA gene- tional response of Pasteurella multocida to defined iron sources.
based oligonucleotide microarray for environmental monitoring J Bacteriol 184(23):6714–6720
of the betaproteobacterial order “Rhodocyclales”. Appl Environ Paustian ML, Kapur V, Bannantine JP (2005) Comparative
Microbiol 71(3):1373–1386 genomic hybridizations reveal genetic regions within the
Lu Y, Lu S, Fotouhi F, Deng Y, Brown SJ (2004) Incremental genetic Mycobacterium avium complex that are divergent from Myco-
K-means algorithm and its application in gene expression data bacterium avium subsp. paratuberculosis isolates. J Bacteriol
analysis. BMC Bioinformatics 5:172 187(7):2406–2415
Mascher T, Margulis NG, Wang T, Ye RW, Helmann JD (2003) Petrov A, Shah S, Draghici S, Shams S (2002) Microarray image
Cell wall stress responses in Bacillus subtilis: the regulatory processing and quality control. In: Shah S, Kamberova G (eds)
network of the bacitracin stimulon. Mol Microbiol 50(5): DNA array image analysis—nuts & bolts. DNA, Eagleville, pp
1591–1604 99–130
Michaels GS, Carr DB, Askenazi M, Fuhrman S, Wen X, Somogyi R Phimister B (1999) Going global. Nat Genet 21(Suppl 1):1
(1998) Cluster analysis and data visualization of large-scale gene Polen T, Rittmann D, Wendisch VF, Sahm H (2003) DNA micro-
expression data. Pac Symp Biocomput 42–53 array analyses of the long-term adaptive response of Escher-
272 Appl Microbiol Biotechnol (2006) 73:255–273

ichia coli to acetate and propionate. Appl Environ Microbiol 69 Schuchhardt J, Beule D, Malik A, Wolski E, Eickhoff H, Lehrach H,
(3):1759–1774 Herzel H (2000) Normalization strategies for cDNA microarrays.
Polen T, Kramer M, Bongaerts J, Wubbolts M, Wendisch VF (2005) Nucleic Acids Res 28(10):E47
The global gene expression response of Escherichia coli to Searles RP (2003) Arrays for the masses-setting up a microarray
L-phenylalanine. J Biotechnol 115(3):221–237 facility. In: Blalock E (ed) A beginner's guide to microarrays.
Quackenbush J (2002) Microarray data normalization and transfor- Kluwer, Boston, pp 123–149
mation. Nat Genet 32(Suppl):496–501 Shchepinov MS, Case-Green SC, Southern EM (1997) Steric factors
Ramdas L, Wang J, Hu L, Cogdell D, Taylor E, Zhang W (2001) influencing hybridisation of nucleic acids to oligonucleotide
Comparative evaluation of laser-based microarray scanners. arrays. Nucleic Acids Res 25(6):1155–1161
Biotechniques 31(3):546, 548, 550, passim Sherlock G (2000) Analysis of large-scale gene expression data. Curr
Ramsay G (1998) DNA chips: state-of-the art. Nat Biotechnol 16 Opin Immunol 12(2):201–205
(1):40–44 Sherlock G, Ball CA (2005) Storage and retrieval of microarray data
Record MT, Reznikoff WS, Craig ML, McQuade KL, Schlax PJ and open source microarray database software. Mol Biotechnol
(1996) Escherichia coli RNA polymerase, sigma70, promotors, 30(3):239–251
and the kinetics of the steps of transcription initiation. In: Smulski DR, Huang LL, McCluskey MP, Reeve MJ, Vollmer AC, Van
Neidhard FC, Curtiss R, Ingraham JL et al (eds) Escherichia Dyk TK, LaRossa RA (2001) Combined, functional genomic-
coli and Salmonella: cellular and molecular biology. ASM, biochemical approach to intermediary metabolism: interaction of
Washington, DC, pp 792–821 acivicin, a glutamine amidotransferase inhibitor, with Escher-
Reed KC, Mann DA (1985) Rapid transfer of DNA from agarose ichia coli K-12. J Bacteriol 183(11):3353–3364
gels to nylon membranes. Nucleic Acids Res 13(20):7207– Southern EM (1975) Detection of specific sequences among DNA
7221 fragments separated by gel electrophoresis. J Mol Biol 98
Reen FJ, Boyd EF, Porwollik S, Murphy BP, Gilroy D, Fanning S, (3):503–517
McClelland M (2005) Genomic comparisons of Salmonella Southern EM (2001) DNA microarrays. History and overview.
enterica serovar Dublin, Agona, and Typhimurium strains Methods Mol Biol 170:1–15
recently isolated from milk filters and bovine samples from Southern E, Mir K, Shchepinov M (1999) Molecular interactions on
Ireland, using a Salmonella microarray. Appl Environ Microbiol microarrays. Nat Genet 21(Suppl 1):5–9
71(3):1616–1625 Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S,
Rhodius V, Van Dyk TK, Gross C, LaRossa RA (2002) Impact of Bernhart D, Sherlock G, Ball C, Lepage M, Swiatek M, Marks
genomic technologies on studies of bacterial gene expression. WL, Goncalves J, Markel S, Iordan D, Shojatalab M, Pizarro A,
Annu Rev Microbiol 56:599–624 White J, Hubley R, Deutsch E, Senger M, Aronow BJ, Robinson
Richardson JP, Greenblatt J (1996) Control of RNA chain elongation A, Bassett D, Stoeckert CJ Jr, Brazma A (2002) Design and
and termination. In: Neidhard FC, Curtiss R, Ingraham JL et al implementation of microarray gene expression markup language
(eds) Escherichia coli and Salmonella: cellular and molecular (MAGE-ML). Genome Biol 3(9):RESEARCH0046
biology. ASM, Washington, DC, pp 822–848 Spruill SE, Lu J, Hardy S, Weir B (2002) Assessing sources of
Rossignol T, Dulau L, Julien A, Blondin B (2003) Genome-wide variability in microarray gene expression data. Biotechniques 33
monitoring of wine yeast gene expression during alcoholic (4):916–920, 922–923
fermentation. Yeast 20(16):1369–1385 Stabler RA, Marsden GL, Witney AA, Li Y, Bentley SD, Tag CMM,
Rouillard JM, Zuker M, Gulari E (2003) OligoArray 2.0: design of Hinds J (2005) Identification of pathogen-specific genes through
oligonucleotide probes for DNA microarrays using a thermody- microarray analysis of pathogenic and commensal Neisseria
namic approach. Nucleic Acids Res 31(12):3057–3062 species. Microbiology 151(Pt 9):2907–2922
Saito I, Sugiyama H, Furukawa N, Matsuura T (1981) Photoreaction Sturn A, Quackenbush J, Trajanoski Z (2002) Genesis: cluster analysis
of thymidine with primary amines. Application to specific of microarray data. Bioinformatics 18(1):207–208
modification of DNA. Nucleic Acids Symp Ser 10:61–64 Talaat AM, Hunter P, Johnston SA (2000) Genome-directed primers
Salama N, Guillemin K, McDaniel TK, Sherlock G, Tompkins L, for selective labeling of bacterial transcripts for DNA microarray
Falkow S (2000) A whole-genome microarray reveals genetic analysis. Nat Biotechnol 18(6):679–682
diversity among Helicobacter pylori strains. Proc Natl Acad Sci Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E,
USA 97(26):14668–14673 Lander ES, Golub TR (1999) Interpreting patterns of gene
Salmon KA, Hung SP, Steffen NR, Krupp R, Baldi P, Hatfield GW, expression with self-organizing maps: methods and application to
Gunsalus RP (2005) Global gene expression profiling in hematopoietic differentiation. Proc Natl Acad Sci USA 96
Escherichia coli K12: effects of oxygen availability and ArcA. (6):2907–2912
J Biol Chem 280(15):15084–15096 Tao H, Gonzalez R, Martinez A, Rodriguez M, Ingram LO, Preston
Sanchez-Cortes S, Berenguel RM, Madejon A, Perez-Mendez M JF, Shanmugam KT (2001) Engineering a homo-ethanol pathway
(2002) Adsorption of polyethyleneimine on silver nanoparticles in Escherichia coli: increased glycolytic flux and levels of
and its interaction with a plasmid DNA: a surface-enhanced expression of glycolytic genes during xylose fermentation.
Raman scattering study. Biomacromolecules 3(4):655–660 J Bacteriol 183(10):2979–2988
Schena M, Davis RW (2000) Technology standards for microarray Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH (2001) Issues in
research. In: Schena M (ed) Microarray biochip technology. cDNA microarray analysis: quality filtering, channel normaliza-
Eaton, Natick, pp 1–18 tion, models of variations and assessment of gene effects. Nucleic
Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative Acids Res 29(12):2549–2557
monitoring of gene expression patterns with a complementary Vemuri GN, Aristidou AA (2005) Metabolic engineering in the-omics
DNA microarray. Science 270(5235):467–470 era: elucidating and modulating regulatory networks. Microbiol
Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW (1996) Mol Biol Rev 69(2):197–216
Parallel human genome analysis: microarray-based expression Warrington JA, Dee S, Trulson M (2000) Large-scale genomic analysis
monitoring of 1000 genes. Proc Natl Acad Sci USA 93 using Affymetrix GeneChip probe arrays. In: Schena M (ed)
(20):10614–10619 Microarray biochip technology. Eaton, Natick, pp 119–148
Appl Microbiol Biotechnol (2006) 73:255–273 273

Wen X, Fuhrman S, Michaels GS, Carr DB, Smith S, Barker JL, Yang IV, Chen E, Hasseman JP, Liang W, Frank BC, Wang S, Sharov
Somogyi R (1998) Large-scale temporal gene expression V, Saeed AI, White J, Li J, Lee NH, Yeatman TJ, Quackenbush J
mapping of central nervous system development. Proc Natl Acad (2002a) Within the fold: assessing differential expression
Sci USA 95(1):334–339 measures and reproducibility in microarray assays. Genome Biol
Wendisch VF, Zimmer DP, Khodursky A, Peter B, Cozzarelli N, 3(11):research0062
Kustu S (2001) Isolation of Escherichia coli mRNA and Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP
comparison of expression using mRNA and total RNA on (2002b) Normalization for cDNA microarray data: a robust
DNA microarrays. Anal Biochem 290(2):205–213 composite method addressing single and multiple slide system-
Wildsmith SE, Elcock FJ (2001) Microarrays under the microscope. atic variation. Nucleic Acids Res 30(4):e15
Mol Pathol 54(1):8–16 Ye RW, Wang T, Bedzyk L, Croker KM (2001) Applications of DNA
Wildsmith SE, Archer GE, Winkley AJ, Lane PW, Bugelski PJ (2001) microarrays in microbial systems. J Microbiol Methods 47
Maximization of signal derived from cDNA microarrays. (3):257–272
Biotechniques 30(1):202–206, 208 Yuen T, Wurmbach E, Pfeffer RL, Ebersole BJ, Sealfon SC (2002)
Worley J, Bechtol J, Penn S, Roach D (2000) A systems approach to Accuracy and calibration of commercial oligonucleotide and
fabricating and analyzing DNA microarrays. In: Schena M (ed) custom cDNA microarrays. Nucleic Acids Res 30(10):e48
Microarray biochip technology. Eaton, Natick, pp 65–85 Zammatteo N, Jeanmart L, Hamels S, Courtois S, Louette P, Hevesi L,
Wurmbach E, Yuen T, Sealfon SC (2003) Focused microarray Remacle J (2000) Comparison between different strategies of
analysis. Methods 31(4):306–316 covalent attachment of DNA to glass surfaces to build DNA
Xiang CC, Brownstein MJ (2003) Fabrication of cDNA microarrays. microarrays. Anal Biochem 280(1):143–150
In: Brownstein MJ, Khodursky A (eds) Functional genomics: Zhou J, Thompson DK (2004) DNA microarray technology. In: Zhou
methods and protocols. Humana, Totowa, pp 1–7 J, Thompson DK, Xu Y, Tiedje JM (eds) Microbial functional
Xu Y (2004) Microarray gene expression data analysis. In: Zhou J, genomics. Wiley, Hoboken, pp 141–176
Thompson DK, Xu Y, Tiedje JM (eds) Microbial functional Zhou YX, Kalocsai P, Chen JY, Shams S (2000) Information
genomics. Wiley, Hoboken, pp 177–206 processing issues and solutions associated with microarray
Yang YH, Speed T (2002) Design issues for cDNA microarray technology. In: Schena M (ed) Microarray biochip technology.
experiments. Nat Rev Genet 3(8):579–588 Eaton, Natick, pp 167–200

You might also like