Evidence For Large Domains of Similarly Expressed

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Evidence for large domains of similarly expressed

genes in the Drosophila genome

Background
Transcriptional regulation in eukaryotes generally operates at the level of individual
genes.
Results
We searched for groups of adjacent and similarly expressed genes in Drosophila
using high-density oligonucleotide microarrays, and found 200 groups, covering
between 20 and 200 kilobase pairs of genomic sequence, with a mean group size of
about 100 kilobase pairs.

Background
Regulating gene expression is fundamental to every cell and involves altering
transcription rates. Promoter activity is modulated by sequence-specific transcription
factors that interact with the protein complexes or with the promoter sequence itself.
In eukaryotes, transcription factors can modify the activity of promoters by binding to
regulatory modules hundreds to thousands of base pairs away from the promoter.
These regulatory modules can either increase or decrease the rate of transcription
for a target gene, depending on the cellular state and the activities of the bound
transcription factors.
Although gene expression can be tightly controlled, neighboring genes or chromatin
regions are important for the expression of individual genes. Insulators seem to
prevent the influence of cis-regulatory modules from being propagated along the
chromosome.
Two recent observations suggest that genomes may be divided into domains
important for controlling the expression of groups of adjacent genes. In addition,
about 50 regions of the human genome show a strong clustering of highly expressed
genes.
Many neighboring genes show similar expression patterns
We collected relative gene-expression profiles from 267 Affymetrix GeneChip
Drosophila Genome Arrays and measured the magnitude of the effect of physically
adjacent genes sharing strikingly similar expression profiles.
We calculated the average pair-wise Pearson correlation of gene expression for
genes in a sliding ten-gene window across the genome and found that groups of
physically adjacent genes with similar expression are common; nearly 1,100 such
groups are significant at a p value of 10-2.
To ensure that ten-gene windows were appropriate, we repeated the analysis using
windows of various sizes. We found that most groups include about ten genes, so
we used a window size of ten for the remainder of our analysis.
Many ten-gene groups that have high average pair-wise correlations of gene
expression represent physically overlapping stretches of genes. We collapsed all
groups that bordered one another into a single group, showing that the effect on
expression extends well beyond ten genes.
The 44 groups of 681 genes that map to the left arm of chromosome two and have a
p value less than 10-2 are shown in Figure 2. The length of genomic sequence
occupied by similarly expressed gene groups is highly variable.
Gene groups are not explained by gene function or homology
We found that many genes that are related by function share similar expression
patterns, and that this is also true for homologous genes, particularly those that
arose from recent duplications.
We considered an extreme model in which evolutionary selection has organized
gene groups according to the biological processes the genes are involved in, so that
their expression can be coordinately regulated. We used the Gene Ontology
database to test this model. Of the 211 groups identified in the full dataset and 176
groups identified in the homologs-removed dataset, 43 and 11 GO terms,
respectively, have associations to groups that meet the above criteria. The observed
enrichment is clearly dependent on homologs.
Similarly expressed gene groups can be identified from smaller datasets
We divided our dataset in two, one with RNA samples taken from embryos and
adults (primarily males), and calculated the average pair-wise correlations for all
groups of genes in each of the two new datasets. The numbers of groups are
remarkably similar to those found for the entire dataset.
We calculated the correlation between the adult, embryo and combined datasets and
found that the average correlation is about 0.35, while the average correlation
between the adult and embryo datasets is lower (about 0.23). The number of genes
involved makes little difference.
Correlations with known chromosome structures
The position of Drosophila chromosome attachments has been proposed to be
precise, but there is very limited mapping data on the position of these attachments.
We attempted to align the regions between attachment sites, but there are no clear
overlaps.

Discussion
We found that 20% of the genes in the Drosophila genome appear to fall into groups
of 10-30 genes that are expressed similarly across a wide range of experimental
conditions. We believe that the findings are most consistent with regulation at the
level of chromatin structure.
Discussions of transcriptional regulation often emphasize the belief that the process
is tightly controlled and error-free. However, we believe that the degree of precision
is less than is generally assumed, and that many genes are allowed to vary
considerably in their regulation.
If coordinated gene expression is unimportant, there should be no selection driving
the groups of co-regulated genes to be evolutionarily conserved.
Although we have assayed a relatively large number of biological samples, we
cannot infer the profiles of unique cellular states.
Data collection
We collected a dataset of 88 experimental conditions hybridized to 267 GeneChip
Drosophila Genome Arrays from six independent investigations that study five
different experimental questions - aging, DNA-damage response, immune response,
resistance to DDT, and embryonic development.
Data processing
GeneChip Drosophila Genome Array intensity data were calculated from the images
generated by the GeneChip scanner, and a dataset was generated by averaging
data for each transcript on replicate arrays and subtracting the average log ratio of
each gene in the dataset.
Identification of adjacent similarly regulated genes
We calculated the average pair-wise correlation between gene-expression profiles
for all genes that were within n genes of one another using the Pearson correlation.
We also calculated the average pair-wise correlation for a random dataset.
We searched for cases in which homologs were near each other in the genome, and
removed 1,369 genes from the dataset. This dataset was subjected to the average
pair-wise algorithm, as was a randomized version of it.
We constructed two non-overlapping subsets of the total data matrix, and applied the
random pair-wise correlation algorithm to each dataset.
Significantly enriched GO terms among gene groups
The probability of observing a GO term with a specified set of genes was calculated
using the hypergeometric distribution.
Many sets of GO terms were tested on many groups of genes, and only those
significantly associated with a group of similarly expressed genes were recorded.
Correlation of groups with known chromosomal structures
We determined the number of polytene bands present in each group of similarly
expressed genes and calculated the number of bands that overlapped randomly
placed groups.

Additional data files


The supplemental materials include a tab-delimited text file of the underlying
expression data, the perl scripts used to process the data, and a text file used to
generate Figure 2. The data can be loaded into the TreeView software to visualize
individual groups.

You might also like