Professional Documents
Culture Documents
ENCODE Project Writes Eulogy For Junk DNA
ENCODE Project Writes Eulogy For Junk DNA
GENOMICS
looks like,” says NHGRI’s Elise Feingold.
Because the parts of the genome used
ENCODE Project Writes Eulogy could differ among various kinds of cells,
ENCODE needed to look at DNA func-
For Junk DNA tion in multiple types of cells and tissues. At
first the goal was to study intensively three
When researchers first sequenced the human tion. With the human genome in hand, the types of cells. They included GM12878, the
genome, they were astonished by how few National Human Genome Research Institute immature white blood cell line used in the
traditional genes encoding proteins were (NHGRI) in Bethesda, Maryland, decided it 1000 Genomes Project, a large-scale effort to
scattered along those 3 billion DNA bases. wanted to find out once and for all how much catalog genetic variation across humans; a leu-
Instead of the expected 100,000 or more of the genome was a wasteland with no func- kemia cell line called K562; and an approved
genes, the initial analyses found about 35,000 tional purpose. In 2003, it funded a pilot human embryonic stem cell line, H1-hESC.
and that number has since been whittled down ENCODE, in which 35 research teams ana- As ENCODE was ramping up, new
to about 21,000. In between were megabases lyzed 44 regions of the genome—30 million sequencing technology brought the cost of
of “junk,” or so it seemed.
This week, 30 research papers, includ-
ing six in Nature and additional papers pub- Hypersensitive CH3CO (Epigenetic modifications)
lished by Science, sound the death knell for sites
the idea that our DNA is mostly littered with RNA
how our DNA works are already clarifying 2007, the pilot project’s results revealed that ble to test extensively even more cell types.
genetic risk factors for a variety of diseases much of this DNA sequence was active in ENCODE added a liver cancer cell line,
and offering a better understanding of gene some way. The work called into serious ques- HepG2; the laboratory workhorse cancer cell
regulation and function. “It’s a treasure trove tion our gene-centric view of the genome, line, HeLa S3; and human umbilical cord tis-
of information,” says Manolis Kellis, a com- finding extensive RNA-generating activity sue to the mix. Another 140 cell types were
putational biologist at Massachusetts Institute beyond traditional gene boundaries (Science, studied to a much lesser degree.
of Technology (MIT) in Cambridge who ana- 15 June 2007, p. 1556). But the question In these cells, ENCODE researchers
lyzed data from the project. remained whether the rest of the genome was closely examined which DNA bases are tran-
The ENCODE effort has revealed that like this 1%. “We want to know what all the scribed into RNA and then whether those
a gene’s regulation is far more complex bases are doing,” says Yale University bioin- strands of RNA are subsequently translated
than previously thought, being influenced formatician Mark Gerstein. into proteins, verifying predicted protein-
by multiple stretches of regulatory DNA Teams at 32 institutions worldwide have coding genes and more precisely locating
located both near and far from the gene now carried out scores of tests, generating each gene’s beginning, end, and coding
itself and by strands of RNA not translated 1640 data sets. While the pilot phase tests regions. The latest protein-coding gene count
into proteins, so-called noncoding RNA. depended on computer chip–like devices is 20,687, with hints of about 50 more, the
“What we found is how beautifully com- called microarrays to analyze DNA samples, consortium reports in Nature. Those genes
plex the biology really is,” says Jason Lieb, the expanded phase benefited from the arrival account for about 3% of the human genome,
an ENCODE researcher at the University of of new sequencing technology, which made it less if one counts only their coding regions.
North Carolina, Chapel Hill. cost-effective to directly read the DNA bases. Another 11,224 DNA stretches are classified
Throughout the 1990s, various research- Taken together, the tests present “a greater as pseudogenes, “dead” genes now known to
ers called the idea of junk DNA into ques- idea of what the landscape of the genome be active in some cell types or individuals.
ENCODE drives home, however, that actions with transcription factors and other ifying the chromatin-associated proteins
there are many “genes” out there in which proteins. ENCODE carried out several tests called histones. The binding sites found
DNA codes for RNA, not a protein, as the end to map where those proteins bind along the through ChIP-seq coincided with the sites
product. The big surprise of the pilot project genome (Science, 25 May 2007, p. 1120). Two, mapped through FAIRE-seq and DNAse-
was that 93% of the bases studied were tran- DNase-seq and FAIRE-seq, gave an overview seq. Overall, 8% of the genome falls within
scribed into RNA; in the full genome, 76% of the genome, identifying where the protein- a transcription factor binding site, a percent-
is transcribed. ENCODE defined 8800 small DNA complex chromatin unwinds and a pro- age that is expected to double once more
RNA molecules and 9600 long noncoding tein can hook up with the DNA, and were transcription factors have been tested.
RNA molecules, each of which is at least 200 applied to multiple cell types. ENCODE’s
bases long. Thomas Gingeras of Cold Spring DNase-seq found 2.89 million such sites
Harbor Laboratory in New York has found in 125 cell types. Stamatoyannopoulos and
that various ones home in on different cell his colleagues describe their more extensive
ENCODE By the Numbers
compartments, as if they have fixed addresses DNase-seq studies in Science (p. 1190): His 147 cell types studied
where they operate. Some go to the nucleus, team examined 349 types of cells, including 80% functional portion of human genome
some to the nucleolus, and some to the cyto- 233 60- to 160-day-old fetal tissue samples.
plasm, for example. “So there’s quite a lot Each type of cell had about 200,000 accessi- 20,687 protein-coding genes
of sophistication in how RNA works,” says ble locations, and there seemed to be at least 18,400 RNA genes
Ewan Birney of the European Bioinformatics 3.9 million regions where transcription fac-
Institute in Hinxton, U.K., one of the key lead- tors can bind in the genome. Across all cell 1640 data sets
30 papers published this week
RELATED http://science.sciencemag.org/content/sci/337/6099/1167.full
CONTENT
http://science.sciencemag.org/content/sci/337/6099/1190.full
http://science.sciencemag.org/content/sci/337/6102/1675.full
PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions
Science (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement of
Science, 1200 New York Avenue NW, Washington, DC 20005. The title Science is a registered trademark of AAAS.
Copyright © 2012, American Association for the Advancement of Science