Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

NEWS&ANALYSIS

GENOMICS
looks like,” says NHGRI’s Elise Feingold.
Because the parts of the genome used
ENCODE Project Writes Eulogy could differ among various kinds of cells,
ENCODE needed to look at DNA func-
For Junk DNA tion in multiple types of cells and tissues. At
first the goal was to study intensively three
When researchers first sequenced the human tion. With the human genome in hand, the types of cells. They included GM12878, the
genome, they were astonished by how few National Human Genome Research Institute immature white blood cell line used in the
traditional genes encoding proteins were (NHGRI) in Bethesda, Maryland, decided it 1000 Genomes Project, a large-scale effort to
scattered along those 3 billion DNA bases. wanted to find out once and for all how much catalog genetic variation across humans; a leu-
Instead of the expected 100,000 or more of the genome was a wasteland with no func- kemia cell line called K562; and an approved
genes, the initial analyses found about 35,000 tional purpose. In 2003, it funded a pilot human embryonic stem cell line, H1-hESC.
and that number has since been whittled down ENCODE, in which 35 research teams ana- As ENCODE was ramping up, new
to about 21,000. In between were megabases lyzed 44 regions of the genome—30 million sequencing technology brought the cost of
of “junk,” or so it seemed.
This week, 30 research papers, includ-
ing six in Nature and additional papers pub- Hypersensitive CH3CO (Epigenetic modifications)
lished by Science, sound the death knell for sites
the idea that our DNA is mostly littered with RNA

Downloaded from http://science.sciencemag.org/ on March 22, 2020


useless bases. A decadelong project, the polymerase
Encyclopedia of DNA Elements (ENCODE),
has found that 80% of the human genome CH3CO
serves some purpose, biochemically speak- CH3
ing. “I don’t think anyone would have antici-
pated even close to the amount of sequence
that ENCODE has uncovered that looks like
it has functional importance,” says John A. DNase-seq Computational
5C ChIP-seq predictions and RNA-seq
Stamatoyannopoulos, an ENCODE researcher FAIRE-seq
RT-PCR
at the University of Washington, Seattle.
Beyond defining proteins, the DNA bases
highlighted by ENCODE specify landing Gene Transcript
spots for proteins that influence gene activ-
ity, strands of RNA with myriad roles, or
simply places where chemical modifications Long-range regulatory elements cis-regulatory elements
(enhancers, repressors/ (promoters, transcription
serve to silence stretches of our chromo- factor binding sites)
silencers, insulators)
somes. These results are going “to change
the way a lot of [genomics] concepts are Zooming in. A diagram of DNA in ever-greater detail shows how ENCODE’s various tests (gray boxes) translate
written about and presented in textbooks,” DNA’s features into functional elements along a chromosome.
Stamatoyannopoulos predicts.
The insights provided by ENCODE into bases in all, about 1% of the total genome. In sequencing down enough to make it feasi-
CREDIT: ADAPTED FROM THE ENCODE PROJECT CONSORTIUM, PLOS BIOLOGY 9, 4 (APRIL 2011)

how our DNA works are already clarifying 2007, the pilot project’s results revealed that ble to test extensively even more cell types.
genetic risk factors for a variety of diseases much of this DNA sequence was active in ENCODE added a liver cancer cell line,
and offering a better understanding of gene some way. The work called into serious ques- HepG2; the laboratory workhorse cancer cell
regulation and function. “It’s a treasure trove tion our gene-centric view of the genome, line, HeLa S3; and human umbilical cord tis-
of information,” says Manolis Kellis, a com- finding extensive RNA-generating activity sue to the mix. Another 140 cell types were
putational biologist at Massachusetts Institute beyond traditional gene boundaries (Science, studied to a much lesser degree.
of Technology (MIT) in Cambridge who ana- 15 June 2007, p. 1556). But the question In these cells, ENCODE researchers
lyzed data from the project. remained whether the rest of the genome was closely examined which DNA bases are tran-
The ENCODE effort has revealed that like this 1%. “We want to know what all the scribed into RNA and then whether those
a gene’s regulation is far more complex bases are doing,” says Yale University bioin- strands of RNA are subsequently translated
than previously thought, being influenced formatician Mark Gerstein. into proteins, verifying predicted protein-
by multiple stretches of regulatory DNA Teams at 32 institutions worldwide have coding genes and more precisely locating
located both near and far from the gene now carried out scores of tests, generating each gene’s beginning, end, and coding
itself and by strands of RNA not translated 1640 data sets. While the pilot phase tests regions. The latest protein-coding gene count
into proteins, so-called noncoding RNA. depended on computer chip–like devices is 20,687, with hints of about 50 more, the
“What we found is how beautifully com- called microarrays to analyze DNA samples, consortium reports in Nature. Those genes
plex the biology really is,” says Jason Lieb, the expanded phase benefited from the arrival account for about 3% of the human genome,
an ENCODE researcher at the University of of new sequencing technology, which made it less if one counts only their coding regions.
North Carolina, Chapel Hill. cost-effective to directly read the DNA bases. Another 11,224 DNA stretches are classified
Throughout the 1990s, various research- Taken together, the tests present “a greater as pseudogenes, “dead” genes now known to
ers called the idea of junk DNA into ques- idea of what the landscape of the genome be active in some cell types or individuals.

www.sciencemag.org SCIENCE VOL 337 7 SEPTEMBER 2012 1159


Published by AAAS
NEWS&ANALYSIS

ENCODE drives home, however, that actions with transcription factors and other ifying the chromatin-associated proteins
there are many “genes” out there in which proteins. ENCODE carried out several tests called histones. The binding sites found
DNA codes for RNA, not a protein, as the end to map where those proteins bind along the through ChIP-seq coincided with the sites
product. The big surprise of the pilot project genome (Science, 25 May 2007, p. 1120). Two, mapped through FAIRE-seq and DNAse-
was that 93% of the bases studied were tran- DNase-seq and FAIRE-seq, gave an overview seq. Overall, 8% of the genome falls within
scribed into RNA; in the full genome, 76% of the genome, identifying where the protein- a transcription factor binding site, a percent-
is transcribed. ENCODE defined 8800 small DNA complex chromatin unwinds and a pro- age that is expected to double once more
RNA molecules and 9600 long noncoding tein can hook up with the DNA, and were transcription factors have been tested.
RNA molecules, each of which is at least 200 applied to multiple cell types. ENCODE’s
bases long. Thomas Gingeras of Cold Spring DNase-seq found 2.89 million such sites
Harbor Laboratory in New York has found in 125 cell types. Stamatoyannopoulos and
that various ones home in on different cell his colleagues describe their more extensive
ENCODE By the Numbers
compartments, as if they have fixed addresses DNase-seq studies in Science (p. 1190): His 147 cell types studied
where they operate. Some go to the nucleus, team examined 349 types of cells, including 80% functional portion of human genome
some to the nucleolus, and some to the cyto- 233 60- to 160-day-old fetal tissue samples.
plasm, for example. “So there’s quite a lot Each type of cell had about 200,000 accessi- 20,687 protein-coding genes
of sophistication in how RNA works,” says ble locations, and there seemed to be at least 18,400 RNA genes
Ewan Birney of the European Bioinformatics 3.9 million regions where transcription fac-
Institute in Hinxton, U.K., one of the key lead- tors can bind in the genome. Across all cell 1640 data sets
30 papers published this week

Downloaded from http://science.sciencemag.org/ on March 22, 2020


ers of ENCODE (see p. 1162). types, about 42% of the genome can be acces-
As a result of ENCODE, Gingeras and sible, he and his colleagues report. In many
others argue that the fundamental unit of cases, the assays were able to pinpoint the spe- 442 researchers
the genome and the basic unit of hered- cific bases involved in binding. $288 million funding for pilot,
ity should be the transcript—the piece of Last year, Stamatoyannopoulos showed technology, model organism, and current project
RNA decoded from DNA—and not the that these newly discovered functional regions
gene. “The project has played an important sometimes overlap with specific DNA bases
role in changing our concept of the gene,” linked to higher or lower risks of various dis-
Stamatoyannopoulos says. eases, suggesting that the regulation of genes Yale’s Gerstein used these results to figure
Another way to test for functionality of might be at the heart of these risk variations out all the interactions among the transcrip-
DNA is to evaluate whether specific base (Science, 27 May 2011, p. 1031). The work tion factors studied and came up with a net-
sequences are conserved between species, or demonstrated how researchers could use work view of how these regulatory proteins
among individuals in a species. Previous stud- ENCODE data to come up with new hypoth- work. These transcription factors formed a
ies have shown that 5% of the human genome eses about the link between genetics and a three-layer hierarchy, with the ones at the top
is conserved across mammals, even though particular disorder. (The ENCODE analy- having the broadest effects and the ones in
ENCODE studies implied that much more sis found that 12% of these bases, or SNPs, the middle working together to coregulate a
of the genome is functional. So MIT’s Lucas colocate with transcription factor binding common target gene, he and his colleagues
Ward and Kellis compared functional regions sites and 34% are in open chromatin defined report in Nature.
newly identified by ENCODE among multi- by the DNase-seq tests.) Now, in their new Using a technique called 5C, other
ple humans, sampling from the work published in Science, researchers looked for places where DNA
1000 Genomes Project. Some “We are informing Stamatoyannopoulos’s lab has from distant regions of a chromosome, or
DNA sequences not conserved linked those regulatory regions even different chromosomes, interacted. It
between humans and other disease studies in a to their specific target genes, found that an average of 3.9 distal stretches
mammals were nonetheless way that would be homing in on the risk-enhanc- of DNA linked up with the beginning of each
very much preserved across ing ones. In addition, the group gene. “Regulation is a 3D puzzle that has to
multiple people, indicating
very hard to do finds it can predict the cell type be put together,” Gingeras says. “That’s what
that an additional 4% of the otherwise.” involved in a given disease. ENCODE is putting out on the table.”
genome is newly under selec- —EWAN BIRNEY, For example, the analysis fin- To date, NHGRI has put $288 million
tion in the human lineage, they EUROPEAN BIOINFORMATICS gered two types of T cells as toward ENCODE, including the pilot proj-
report in a paper published INSTITUTE pathogenic in Crohn’s disease, ect, technology development, and ENCODE
online by Science (http://scim. both of which are involved in efforts for the mouse, nematode, and fruit fly.
ag/WardKellis). Two such regions were near this inflammatory bowel disorder. “We are All together, more than 400 papers have been
genes for nerve growth and the development informing disease studies in a way that would published by ENCODE researchers. Another
of cone cells in the eye, which underlie dis- be very hard to do otherwise,” Birney says. 110 or more studies have used ENCODE data,
tinguishing traits in humans. On the flip side, Another test, called ChIP-seq, uses an says NHGRI molecular biologist Michael
they also found that some supposedly con- antibody to home in on a particular DNA- Pazin. Molecular biologist Mathieu Lupien of
served regions of the human genome, as high- binding protein and helps pinpoint the loca- the University of Toronto in Canada authored
lighted by the comparison with 29 mammals, tions along the genome where that protein one of those papers, a study looking at epi-
actually varied among humans, suggesting works. To date, ENCODE has examined genetics and cancer. “ENCODE data were
these regions were no longer functional. about 100 of the 1500 or so transcription fundamental” to the work, he says. “The cost
Beyond transcription, DNA’s bases func- factors and about 20 other DNA binding is definitely worth every single dollar.”
tion in gene regulation through their inter- proteins, including those involved in mod- –ELIZABETH PENNISI

www.sciencemag.org SCIENCE VOL 337 7 SEPTEMBER 2012 1161


Published by AAAS
ENCODE Project Writes Eulogy for Junk DNA
Elizabeth Pennisi

Science 337 (6099), 1159-1161.


DOI: 10.1126/science.337.6099.1159

Downloaded from http://science.sciencemag.org/ on March 22, 2020


ARTICLE TOOLS http://science.sciencemag.org/content/337/6099/1159

RELATED http://science.sciencemag.org/content/sci/337/6099/1167.full
CONTENT
http://science.sciencemag.org/content/sci/337/6099/1190.full
http://science.sciencemag.org/content/sci/337/6102/1675.full

PERMISSIONS http://www.sciencemag.org/help/reprints-and-permissions

Use of this article is subject to the Terms of Service

Science (print ISSN 0036-8075; online ISSN 1095-9203) is published by the American Association for the Advancement of
Science, 1200 New York Avenue NW, Washington, DC 20005. The title Science is a registered trademark of AAAS.
Copyright © 2012, American Association for the Advancement of Science

You might also like