Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Genomics and Postgenomics

Stephan Guttinger&John Dupré

First published Thu Oct 20, 2016

About 30 years ago researchers and other stakeholders started setting up the first genomics
initiative, the Human Genome Project (HGP) (see the link to All About the Human Genome
Project (HGP) in the Other Internet Resources section below). What was conceived as an
audacious plan in the 1980s turned into an official multi-centre, international program in 1990
and was brought to a conclusion in 2003.

More than a decade later genomics is still big in business (and big business): the Obama
administration announced in January 2015 that they intend to sequence one million human
genomes (see Precision Medicine Initiative in the Other Internet Resources section below; see
also Reardon 2015). Craig Venter, the commercially minded nemesis of the publicly-funded
HGP is also in the mix again, this time involved in a privately-funded collaboration that aims to
sequence two million genomes over the course of the next ten years (Ledford 2016). And
equally important, we see not only the same players clash again but also the same promises
being made, with talk of “groundbreaking health benefits” and “new medical breakthroughs”
appearing once again in press releases and other announcements (see for instance Collins &
Varmus 2015 or NIH 2015).

But many things are also different now. For instance, China has emerged as a major player in
the genomics field, with the BGI (formerly the Beijing Genomics Institute) already announcing
in 2011 the aim to sequence one million genomes. Moreover, DNA sequencing is no longer the
only goal of these large-scale initiatives: the new genomics is of course still a genome-based
effort, but it is a transformed enterprise that also focuses on data about proteins, DNA
methylation[1] patterns or the physiology and the environment of the people studied; DNA
sequence data now forms only part of a much larger picture in the push for what is called
‘precision’ or ‘personalised’ medicine. Developments such as these have led many to refer to
the present as a ‘postgenomic’ age (Richardson & Stevens 2015). The goal of this entry is to
look at this constantly developing space of genomic and postgenomic research and outline some
of the central philosophical issues it raises.

Section 1 will introduce and discuss several key terms, such as ‘genome’ or ‘genomics’. Section
2 will then turn to the question of what it means to read and interpret the genome. What did the
sequencing and the mapping of the human genome entail and what philosophical issues arose in
the context of the human genome project? How did sequencing evolve into a much larger
‘postgenomic’ enterprise and what issues did this transformation bring about? To answer the
last question Section 3 will consider two different projects, perhaps newly emerging fields,
namely the HapMap project, and metagenomics. In the supplementary document The ENCODE
Project and the ENCODE Controversy, we will look at the ENCODE project and the
controversy that surrounded it. These three cases will highlight key issues that come up again
and again in the context of genomics and postgenomics.

It is also important to point out what this article is not about. There are already a number of
entries in the Stanford Encyclopedia of Philosophy (SEP) that deal specifically with genes,
genetics and also the HGP, and the present entry will not, therefore, address in much detail the
history of, or the philosophical issues surrounding, the concept of the ‘gene’ (see SEP entry
gene, but also the entries molecular biology, molecular genetics, and the human genome

1
project), or the history of the HGP (see SEP entry, the human genome project). Broader issues
that also play a role in genomics, such as the notion of biological information and the issue of
reductionism have also been discussed in a set of SEP entries (for more on reductionism see
reductionism in biology; gene; HGP; and molecular genetics and for more on the metaphor of a
‘genetic program’ and biological information see entries on biological information; gene;
molecular genetics; and molecular biology). Furthermore, and probably most importantly of all,
our focus here will be on the epistemological, ontological and methodological issues raised by
genomics rather than the ethical, legal and social issues that the sequencing of DNA inevitably
brings up (but see HGP entry for more on these topics).

• 1. Terminology and Definitions


o 1.1 Gene—Genome—Genomics
o 1.2 What is a Genome?
• 2. Reading the Genome
o 2.1 The Run-up to the HGP
o 2.2 The Results and Impact of the HGP
o 2.3 Genome Size, the C-value Paradox and Junk DNA
• 3. Beyond Sequencing
o 3.1 The International HapMap Project
o 3.2 Metagenomics
• 4. Outlook
o Supplement: The ENCODE Project and the ENCODE Controversy
• Bibliography

1. Terminology and Definitions

1.1 Gene—Genome—Genomics

The term ‘genomics’ derives from the term ‘genome’, which itself derives (in part) from the
term ‘gene’. The meaning(s) of—and the relationships between—these different terms is by no
means simple.

The term ‘gene’ was introduced in 1909 by the Danish biologist Wilhelm Johannsen, who used
it to refer to the (then uncharacterised) elements that specify the inherited characteristics of an
organism (see gene and molecular genetics entries for an overview of the complex history of the
term ‘gene’).

The term ‘genome’ was introduced in 1920 by the German botanist Hans Winkler (1877–1945)
in his publication “Verbreitung und Ursache der Parthenogenesis im Pflanzen- und Tierreiche”
(Prevalence and Cause of Parthenogenesis in the Plant and Animal Kingdom). Winkler defined
the term as follows:

Ich schlage vor, für den haploiden Chromosomensatz, der im Verein mit dem zugehörigen
Protoplasma die materielle Grundlage der systematischen Einheit darstellt, den Ausdruck: das
Genom zu verwenden […]. (Winkler 1920: 165)

I propose to use the expression ‘genome’ for the haploid set of chromosomes that, in
conjunction with the associated protoplasm, represents the material foundation of the systematic
unit [often translated as ‘species’]. (Translation by S.G.)

The etymology of the term is not clear but most authors and encyclopaedia entries assume that it
is a combination of the German words ‘Gen’ and ‘Chromosom’, leading to the composite

2
‘Genom’. In general, the origin and the different meanings of the -ome suffix are not entirely
clear and there are now several accounts that try to bring some structure and/or meaning to the
ever flourishing -omes terminology in contemporary life sciences (see, e.g., Lederberg &
McCray 2001; Fields & Johnston 2002; Yadav 2007; Eisen 2012: Baker 2013; for
interesting/entertaining lists, see -omes and -omics in the Other Internet Resources section
below).

The term ‘genomics’, finally, was invented in 1986 at a meeting of several scientists who were
brainstorming (in a bar) to come up with a name for a new journal that Frank Ruddle (Yale
University) and Victor McKusick (Johns Hopkins University) were setting up. The aim of this
journal was to publish data on the sequencing, mapping and comparison of genomes. To capture
these different activities—and in analogy to the well-established discipline of genetics—
Thomas Roderick (Jackson Laboratory) proposed the term ‘genomics’ (Kuska 1998).
Unbeknownst to the people involved this was a significant moment in the history of the life
sciences, as it is here that the -omics suffix appears for the first time.

1.2 What is a Genome?

Looking at the history and the etymology of a term does not, of course, necessarily tell us a lot
about how it is used in the context of current science. So what is a genome in today’s life
sciences? Is it the (haploid) set of chromosomes we find in the nucleus of a eukaryotic cell, in
line with the original definition by Winkler? Or is it the totality of genes we find in an organism
or the totality of DNA present in a cell? And if so, which DNA? Most definitions that are
currently in circulation are an intricate mix of different ways of approaching the issue. This can
be illustrated by looking at the definitions given in several key online resources (for more
definitions of the term ‘genome’ see Table 1 in Keller 2011).

The term is defined on the genome.gov website glossary:

The genome is the entire set of genetic instructions found in a cell. In humans, the genome
consists of 23 pairs of chromosomes, found in the nucleus, as well as a small chromosome
found in the cells’ mitochondria. Each set of 23 chromosomes contains approximately 3.1
billion bases of DNA sequence. (Talking Glossary: genome, in the Other Internet Resources)

And this is how the U.S. National Library of Medicine defines it:

A genome is an organism’s complete set of DNA, including all of its genes. Each genome
contains all of the information needed to build and maintain that organism. In humans, a copy of
the entire genome—more than 3 billion DNA base pairs—is contained in all cells that have a
nucleus. (NIH 2016)

Similarly the education portal of the journal Nature:

A genome is the complete set of genetic information in an organism. It provides all of the
information the organism requires to function. In living organisms, the genome is stored in long
molecules of DNA called chromosomes. (Scitable: genome, in the Other Internet Resources)

All of these definitions refer both to information and to instructions for the development and/or
functioning of an organism. In the first two, the genome is also identified with a material entity,
in the first case the chromosomes, in the second a sequence of base pairs. Nature allows only
that the information is “stored in” the chromosomes.

The combination of these two aspects is highly problematic. The definition from the U.S.
National Library of Medicine implies that “all of the information needed to build and maintain

3
that organism” is contained in the DNA,[2] which is certainly false: many environmental factors,
not to mention factors in the maternal cytoplasm, are required for the first task, and even more
obviously (food, light, etc.) for the second. Moreover when, as is almost always the case, an
organism requires symbiotic partners for its proper functioning, such a definition will imply that
the DNA of these symbionts is part of the genome of the first organism, a result that few would
welcome.[3] The Nature definition commits the same error in its second sentence. The
genome.gov definition appears to identify the chromosomes both with a set of instructions and a
material entity, which appears rather problematically to conflate a material object with an
abstract entity.

The problem is not hard to see. Attempting to combine aspects of the material base of the
genome and its informational content, as all these definitions do, inevitably assume some simple
relation between these two; but in fact the relationship is extremely complex. Because the
informational content of the genome is dependent in multiple ways on elements that are not, on
any account, part of the genome, an account in terms purely of informational content seems a
hopeless project.

One commonly held view that can be quickly dismissed, is the idea that the genome is just the
sum total of an organism’s genes. The problem here is just that even passing over the well-
known problems with saying what a gene is (see Barnes & Dupré 2008; SEP entry on the gene),
on any tenable account of genes, there is far more to the genome than genes, and only a fraction
of the actual DNA contained in the chromosomes would be part of the genome, at least in the
case of humans and other organisms that have a relatively large amount of non-coding DNA
(Barnes & Dupré 2008: 76).[4] Even if ‘gene’ is interpreted in the widest possible sense,
including any section of the genome that has some identifiable function, no one denies that a
significant amount of DNA is not functional. The rest of the DNA would not form part of the
genome, an outcome that contradicts all definitions of the genome of which we are aware, and
makes nonsense of such familiar concepts as ‘whole-genome sequencing’, which refers to the
analysis of all the DNA found in the chromosomes.

There are, we suggest, two initially tenable approaches to the problem.[5] The first, and one that
is often implicitly or explicitly assumed to be correct, is to define the genome as the sequence of
nucleotides. This may or may not contain extranuclear DNA, as in mitochondria or chloroplasts;
the genome.gov definition explicitly includes the former. This last question figured largely in
debates over the moral permissibility of so-called mitochondrial transplants (a designation that
speaks volumes, incidentally, about the almost magical importance attached to DNA as opposed
to the remaining contents of the cell), but it is not one of great philosophical significance. The
alternative approach is to understand the genome strictly as a material object, presumably, in
most cases, the nuclear chromosomes.

The problem with the first approach is that it is largely motivated by the assumption that the
nucleotide sequence is what contains all the important information in the genome. But in fact it
has become increasingly clear that this is not the case, especially as a result of the growing
understanding of epigenetics. Epigenetics is the study of material modifications of the genome
that affect what parts of the genome sequence are or are not transcribed into RNA, the first stage
of the process by which the genome influences the containing organism. The two most well-
studied classes of epigenetic modification are methylation, the attachment of a methyl group (-
CH3) to one of the four nucleotides, cytosine, and various chemical modifications of the histone
proteins, proteins that form the core structure of the chromosomes, and around which the DNA
double helix is wrapped (Bickmore & van Steensel 2013; Cutter & Hayes 2015). The nucleotide
sequence, then, provides the (extremely large) set of possible transcripts that the genome can
produce, but the epigenetic state of the genome determines which transcripts are actually
produced (Jones 2012). Both features of the genome (qua material object) must be specified,
therefore, if we want to understand the biologically relevant behavior of the whole system.

4
So if the motivation for defining the genome in terms of sequence is to capture its informational
content, the definition fails to serve its goal. Indeed, the definition that will come closest to this
goal is that which identifies the genome as the material object, the set of chromosomes (this
interpretation of the genome is defended in detail in Barnes & Dupré 2008). An implication of
this definition that is often taken to be counterintuitive by biologists is that the genome will on
this account encompass not only DNA, but the histone proteins that are material parts of the
chromosomes. But of course the point of the preceding discussion is that the variable chemical
states of the histones are, in fact, essential bases for some of the information inherent in the
genome.

The phenomenon of methylation makes a similar point in a slightly different way. The
nucleotides that comprise the familiar sequence are cytosine, thymine, adenine and guanine).
When a methyl group attaches to the cytosine molecule the resultant nucleotide is not, strictly
speaking, cytosine, but 5-methyl cytosine. So unless one takes the letter ‘C’ in the standard
representation of sequence to mean, rather counterintuitively, “cytosine or 5-methyl cytosine”, it
is only a partially accurate representation of the feature of the genome it purports to represent.
More importantly, it is a representation that fails to capture crucial functional aspects of the
genome.

A final telling point is that it has recently become clear that there are functions of the genome,
as material object, that go well beyond even the broadest interpretation of the genetic (Bustin &
Misteli 2016). It appears that the genome plays an essential role in a range of cellular processes.
First, its physical arrangement into domains of varying sizes plays a central role in the
coordination of gene expression. But much further from the genetic, it is a large object the
mechanical forces of which are involved in various cellular processes and cellular homeostasis,
and the chromatin fiber provides a scaffolding for both proteins and membranes (Bustin &
Misteli 2016). Unless we are to introduce a new word to refer to this biologically vital entity,
only a material conception of the genome can capture the full range of its activities.

One might be tempted to object to the argument above concerning methylation, that whereas
methylation is a somewhat transitory state, the underlying four-letter sequence is extremely
durable, lasting across many generations. Richard Dawkins (1976) famously emphasized the
importance of this durability in arguing for the importance of this stability in evolution, even
going so far as to describe genes as “immortal”. So perhaps there is a good reason for
understanding “C” as referring to a disjunction.

This is not the place to address the quasi-theological view of gene immortality. However, this
does point to a fundamental issue about the nature of genes. Even if genes, somehow, were
unchanging immortal substances, the genome is nothing of the sort. It is an extremely dynamic
entity, constantly changing its properties in generally adaptive response to it environment.
Moreover even the constancy of its nucleotide sequence is something maintained only by the
continuous application of various editing and repair mechanisms. Indeed, far from being an
eternal substance, we suggest it is much better seen as a process, a highly complex set of
dynamic activities crucial in maintaining the structural and functional stability not only of the
organism but also, through its role in reproduction, of the lineage. Importantly, these relations
are bi-directional and, specifically, the organism is also crucial to maintaining the necessary
aspects of stability of the genome.[6]

2. Reading the Genome

The first genome to be sequenced was that of a virus, namely bacteriophage ΦX174, sequenced
by Frederick Sanger in 1977 (Sanger et al. 1977). Up to about 1985, work on several other
viruses was initiated in different laboratories across the world and even the sequencing of model

5
organisms such as the bacterium Escherichia coli or the roundworm Caenorhabditis elegans
was being tackled.[7]

Of all the different sequencing efforts at the time the human genome project (HGP) of course
stands out. Not only is the human genome relatively large (roughly 3.2 billion base pairs (bps))
and of key interest to us as human beings, but the HGP itself was envisioned as a diverse large-
scale research project with various strands and aims. Getting the sequence out of this project
was the one goal that got the most attention in the wider media, but surely many would agree
that other findings and practices developed within the HGP were of equal or even greater
importance.

In what follows we will treat the HGP as a pivot around which genomics developed as a field of
research and as a set of techniques. For ease of exposition we will talk here of a pre-HGP and a
post-HGP phase. Obviously, this is a simplification; there is not just one single trajectory along
which the story of genomics runs and there is not one clear break between a pre- and a post-
genome era (Richardson & Stevens 2015). Nevertheless, as a way of structuring the discussion
this distinction will be a helpful tool.

2.1 The Run-up to the HGP

A decade after Sanger and Maxam and Gilbert published their DNA sequencing methods in
1975 the first concrete talk of a human genome project started to appear in writing (Dulbecco
1986) and at different workshops (Sinsheimer 1989; Palca 1986). The Human Genome Project
(HGP) itself became a reality in 1990 when it was officially launched as a US federal program
(see 1990 in a brief history and timeline [NHGRI] in the Other Internet Resources section
below).

In the run-up to the HGP there were high expectations (some would say “hype”) developing,
which inevitably also brought critics of the project onto the scene (Koshland 1989; Luria et al.
1989). As so often, the issue of funding had a key role to play. When the HGP was initiated
there were no ‘big science’ projects being pursued in the life sciences. The HGP therefore was a
true first for biology. But pushing such a large project that absorbed a significant proportion of
the funding allocated to the biological sciences encountered a lot of resistance from other
scientists.

There were three key criticisms: 1) Some claimed that the HGP was a waste of money because
much useless (read: junk) DNA was sequenced; the focus should be more directly on the
functional parts of the genome, i.e., the genes or regulatory elements, which could be achieved
using simpler and less expensive methods (Brenner 1990; Weinberg 1991; Rechsteiner 1991;
Lewontin 1992; Rosenberg 1994). Others claimed 2) that the HGP was a waste of money as it
was merely a descriptive and not a hypothesis-driven project. This was an issue that became
much more prominent ten years after the project was finished, when it became clear that big
data science was here to stay (see, e.g., Weinberg 2010).[8]

And last but not least there was also the critique 3) that the HGP is fundamentally misguided as
it assumes that by using sequence knowledge alone we would be able to develop an
understanding of how our body works, how it develops disease, and that this understanding will
eventually lead to cures for many diseases (Lewontin 1992; Tauber& Sarkar 1992; Kitcher
1994). This more general critique of a narrowly sequence-focused approach to biomedical
issues also comes up 20 years later in discussions about the use of common genetic variants to
learn more about common diseases and traits (see Section 3.1.2).

It is difficult to evaluate criticisms of the last kind. There is no doubt that enthusiasm for the
HGP and many other successor projects in genomics has often been grounded in simplistic

6
assumptions about the power of DNA and its pre-eminent role in biological systems. On the
other hand it is arguable that many unanticipated benefits have derived from genomics quite
independently of such assumptions. For instance the ability to make very precise comparisons
of genome sequences has led to major advances in unraveling the details of evolutionary
history, not to mention its application to technologies such as forensic DNA testing. Moreover,
it can be argued with Waters (2007b) that what makes genomes so central to biological research
is not the erroneous belief that they are the ultimate causes of everything, but rather the unique
possibilities they present for precise intervention in organisms or cells.

2.2 The Results and Impact of the HGP

The main output of the HGP is usually seen as ‘the’ human genome sequence. The draft human
genome sequence (about 90% complete) was announced in June 2000, followed in 2001 by the
publication of the draft sequences produced by the HGP (International Human Genome
Sequencing Consortium 2001) and the privately funded initiative (Venter et al. 2001). The
complete (or almost complete (99%)) sequence of the human genome was released in 2003,
which also marked the official ending of the HGP (International Human Genome Sequencing
Consortium 2004).

But the view that the sequence of ‘the’ human genome was the key output is wrong in several
ways. First of all there is in general no such thing as ‘the’ human genome, as each individual
(except for monozygotic twins) carries their own set of small and large variations in their
genome (and even for twins there are many differences they accumulate in their genomes during
their lifetime). The sequence that was produced in the HGP is therefore nothing more than an
example of one particular sequence, meaning it can only serve as a reference genome.
Importantly, the reference sequences that both the HGP and Venter’s project delivered did not
correspond to the genome of a single person as the DNA used to produce them was derived
from several individuals.[9] The genomes that came out of the two sequencing efforts were
therefore composite reference sequences. But the HGP also produced much more than just a
DNA sequence. Here we will highlight three outcomes or aspects of the HGP that are of
particular importance, also for the period that followed the completion of the project.

One key feature of the HGP was that it involved the sequencing of a range of different model
organisms, an aspect of the HGP that was often overlooked in discussions of the project in the
philosophical literature and elsewhere (Ankeny 2001; for a searchable list of sequenced
genomes see genome information by organism in the Other Internet Resources section below).
The HGP provided not only a first reference genome of Homo sapiens but also the first bacterial
genome (Haemophilus influenzae, Fleischmann et al. 1995), the first eukaryotic genome
(Saccharomyces cerevisiae, Goffeau et al. 1996), and the genomes of key model organisms
(Escherichia Coli, Blattner et al. 1997; Caenorhabditis elegans, C. elegans Sequencing
Consortium 1998; Arabidopsis thaliana, Arabidopsis Genome Initiative 2000; Drosophila
melanogaster, Adams et al. 2000, Myers et al. 2000).[10]

A further crucial output was the acceleration in technology development the HGP brought
about. It is safe to say that without the HGP (and subsequent initiatives such as the Advanced
Sequencing Technology Awards created in 2004 by the National Human Genome Research
Institute (NHGRI) (NIH 2004)) there wouldn’t have been such a rapid development in next-
generation sequencing (NGS) approaches and the cost of whole genome sequencing would not
have dropped as quickly as it has (see Mardis 2011 for a review of the development of NGS).
And these improvements in the sequencing technology had further consequences, for example
allowing scientists to sample DNA in different ways and from different sources, as new
sequencing methods could process more DNA material more quickly and work with less
starting material. This, finally, made possible whole new sub-disciplines, such as metagenomics
(see Section 3.2).

7
A final noteworthy output of the HGP is what scientists learned about the structure of the
genome. Beginning with the HGP, and building on further studies, researchers have gained a
much more detailed picture of the fine structure, the dynamics and the functioning of the human
genome. It was not only that there were many fewer genes present than expected, but there was
also much more repetitive DNA and transposable elements present (it is estimated that about
45% of human DNA consists of transposable elements or their inactive remnants). These
findings relate to a more general and older discussion about genome size and complexity to
which we next turn.

2.3 Genome Size, the C-value Paradox and Junk DNA

It has been known since the 1950s that genome size varies greatly between different organisms
(Mirsky & Ris 1951; see also Gregory 2001), but from the very beginning it was also clear that
this diversity has some surprising features. One of these features is the absence of correlation
between the complexity of an organism and the size of its genome.

2.3.1 The C-value Paradox

Assuming an informational account of the genome one would expect that the more complex an
organism is, the more DNA its genome should contain (this is in fact what many biologists
assumed at least until about the 1960s). How to define and assess the complexity of an organism
is a tricky issue, but intuitively it seems reasonable to assume that a single-celled amoeba is less
complex than an onion, which in turn is less complex than a large metazoan such as a human
being, both in terms of the complexity of the workings and the structure of the organism. The
expectation was that the DNA content of human cells should be much larger than that of onions
or amoebae. As it turns out, however, both the onion and the amoeba have much larger genomes
than human beings. The onion, for instance, has a genome of about 16 billion base pairs,
meaning it is about five times the size of the human genome (Gregory 2007). The same lack of
correlation between genome size and complexity can be found in many other instances (for an
overview of different genome sizes see the animal genome size database in the Other Internet
Resources section below).

It was also found early on that very similar species in the same genus show large variation in
genome size, despite having similar phenotypes and karyotypes (i.e., number and shape of
chromosomes in a genome). Within the family of buttercups, for instance, DNA content varied
up to 80-fold (Rothfels et al. 1966). Also, Holm-Hansen (1969) showed that species of
unicellular algae display a 2000-fold difference in DNA content despite all being of similar
developmental complexity. It was findings such as these that gave a real urgency to addressing
this discrepancy that was now labelled the C-value paradox (Thomas 1971). The term ‘C-value’
refers to the constant (‘C’) amount (‘value’) of haploid DNA per nucleus and is measured in
picograms of DNA per nucleus. The C-value is a measure of the amount of DNA each genome
contains (we can see here Winkler’s original definition of the genome at work).

2.3.2 Junk DNA

These discussions of genome sizes were closely related to concerns about gene numbers. And
this consideration of genome size vs. gene numbers is what originally gave rise to the concept of
‘junk DNA’ (Ohno 1972).[11] The reasoning behind this concept was the following: if one
assumes a) that more complex organisms will have more DNA than less complex organisms and
b) that gene numbers increase in proportion with genome size, then the genome of the more
complex organism should have more genes than the less complex one.[12] Human cells, for
instance, contain about 750x more DNA than E. coli, meaning that they should turn out to have
in the range of 3.7 million genes, as E. coli has about 5000 genes. This is clearly not the case;
even in the 1970s it was generally supposed that the human genome might contain no more than

8
150,000 genes (Crollius et al. 2000). This discrepancy leads to the conclusion that the vast
majority of the DNA in our genome cannot be genes and is therefore what Ohno referred to as
‘junk’.[13]

The problem that the junk DNA discussion brings up has also been referred to as the ‘G-value
paradox’ (‘G’ stands for ‘gene’), which directly concerns the discrepancy between the number
of genes in an organism and its complexity (Hahn & Wray 2002). This paradox has been
reinforced by the findings of the HGP. As Gregory (2005) and other commentators have pointed
out, the finding that the human genome contains many fewer genes than expected was one of
the most surprising outcomes of the HGP. Initial estimates from before the project were in the
range of 50,000 to 150,000. These were reduced to about 30,000—35,000 after the publication
of the first sequence draft in 2001 and have now been further revised to the order of 20,000
(Gregory 2001).

Some researchers assumed that the C-value paradox was fully resolved by the recognition that
there is non-coding DNA in genomes (Gregory 2001). Larger genome size in ‘simpler’
organisms merely means that they have large quantities of non-coding DNA. But as Gregory
points out, the fact that the majority of DNA in our genomes is non-coding might make the C-
value discrepancies less of a paradox, but it gives rise to a whole range of further puzzles
(Where does this extra DNA come from? What is its function? Etc.), which is why he proposes
to talk of the C-value as an enigma rather than a paradox (Gregory 2001). The C-value enigma
consists of many different and layered problems and these require a pluralistic approach to
answering them, or so Gregory claims.

The publication of the draft genome sequence in 2001 and the conclusion of the HGP in 2003
did not give researchers all the tools and insights they needed to tackle these long-standing
problems. But after the HGP, building on the initial sequencing effort, researchers could start to
go beyond the mere sequence and gain a deeper understanding of the workings of the genome.
This put them in a position to tackle issues such as the significance of junk DNA and the C-
value paradox more directly (or at least from a different angle). The post-HGP phase is also
characterized by an intense debate about the best way of doing research: the question of whether
biological research should best be done on a small or a large scale has come up again and again
in the post-HGP era, especially with the rise of other post-HGP large scale projects. The next
section will address two projects/research fields that symbolize the various efforts and
aspirations that were characteristic of the post-HGP era and which will help to illuminate some
of the philosophical issues these developments raised.

3. Beyond Sequencing

The post-HGP phase is marked by a flourishing of different projects, closely connected in their
origins to the HGP, but going beyond it in many different ways. This section discusses two such
post-HGP projects, namely the International HapMap project and a new field of research called
‘metagenomics’. These examples indicate some important directions in which the postgenomic
era is heading and identify some, though certainly not all, of the key characteristics and issues
that mark this new period.

3.1 The International HapMap Project

The International HapMap project was a multi-centre project launched in 2002 that came to an
initial conclusion in 2005 (NIH 2002).[14] The acronym ‘HapMap’ stands for ‘haplotype map’
and (indirectly) refers to the main goal of the project, namely to map the common genetic
variation in the human genome.

9
3.1.1 The HapMap Project and Genomic Variation

It is a well-known fact that everyone’s genome is different. There are, however, several ways in
which genomes of individuals can vary from each other, ranging from the deletion, insertion or
rearrangement of longer stretches of DNA to differences in single nucleotides at specific
locations on a chromosome. The latter form of variation was the focus of the HapMap project. If
we align the DNA sequence of two individuals they will be identical for hundreds of
nucleotides; the DNA of two human beings typically displays about 99.9% sequence identity
(Li & Sadler 1991; Wang et al. 1998; Cargill et al. 1999). But the 0.1% difference means that
approximately every 1000 nucleotides there will be a difference in a single nucleotide between
any two individuals.

Any variation at a specific genomic locus is referred to as an ‘allele’. If there are two different
versions of a specific gene that can be found in a population at a specific locus on a
chromosome, then that means that there are two different alleles of that gene present in that
population.[15] If one of these single nucleotide alleles is found in more than 1% of a specific
population it is treated as a ‘common’ variant and researchers speak of a ‘polymorphism’ or,
more precisely, a ‘single nucleotide polymorphism’ (abbreviated ‘SNP’; pronounced ‘snip’). If
a variation is found in less than 1% of the population researchers simply call it a ‘mutation’ (or
also a ‘point mutation’).[16] On average there are about 3 million SNPs found in each individual
and there is a pool of more than 10 million SNPs present in the human population as a whole
(HapMap 2005).

Many of these alleles are (or have an increased likelihood of being) inherited together, meaning
that they do not easily become separated through recombination events during meiosis.[17] This
leads to the non-random association of different alleles at two or more loci, a phenomenon that
has been dubbed ‘linkage disequilibrium’ or ‘LD’. The concept of LD is key for the HapMap
project as the fact that some SNPs stay associated (whereas the clusters themselves might get
separated from each other over time by recombination events) explains the haplotype structure
of the genome (Daly et al. 2001). The term ‘haplotype’ simply refers to a particular cluster of
alleles (in this case SNPs) that a) are on the same chromosomes and b) are commonly inherited
as one. The aim of the HapMap project was to characterize human SNPs, their frequency in
different populations and the correlations between them (HapMap 2003). The first haplotype
map was published in 2005, reporting on data from 269 samples derived from four different
populations (HapMap 2005). Five years later, a follow up was published, now reporting on data
from 1184 individuals sampled from 11 different populations (HapMap 2010).

The realization that the structure of genetic variation in the genome can be understood in terms
of haplotypes was important for at least two reasons. First it opened the door for a relatively
easy and efficient analysis of (single nucleotide) genetic variation in populations: the clustering
of SNPs meant that in principle only one or a few of the SNPs in each cluster (so-called ‘tag
SNPs’) would have to be tested to verify the presence of the cluster of variants as a whole. This
made the analysis of genetic variation at the level of whole genomes from a large number of
subjects feasible at a time when whole-genome sequencing was still too expensive for such a
task (HapMap 2003). The development of a haplotype map was therefore a crucial step to
enable what are now called ‘genome-wide association studies’ (GWAS) (see Section 3.1.2).

Secondly, as the distribution of haplotypes varies between different populations, the HapMap
project had a strong focus on sampling DNA from different populations. This is an important
aspect of this type of research as it brought, unwittingly perhaps, the issue of race and the
question of its biological basis right back into genomics. This point will be revisited in Section
3.1.4.

10
3.1.2 The HapMap, GWAS and the Idea of Personalized Medicine

A key point driving the HapMap project was the fact that SNPs can be used to uncover
connections between an individual’s DNA sequence and specific conditions or traits. At face
value an SNP is simply a distinguishing mark in the genome of a person. Such marks allow
researchers to screen groups of a population with different phenotypes, for instance those with a
condition (e.g., high blood pressure) and those without. Looking at the frequency of specific
SNPs or haplotypes in either group the researchers can use statistical analysis to get insight into
the association between a particular SNP or haplotype and a trait (Cardon & Bell 2001). As
mentioned above, this analysis can be focused on tag SNPs that are treated as proxies for a
whole cluster of SNPs (if the cluster has a high LD).

Once a haplotype has been associated with a particular condition, other people can be screened
for the presence of that haplotype and therefore gain some understanding of the risk groups they
belong to. Although the test will not tell carriers of disease-linked SNPs whether they will
develop the condition or not, it can nevertheless give them some information about their
chances. Furthermore, even though the tag SNP itself might not be the genetic variation that
causes or contributes to the variation in phenotype, it might be linked to so-called ‘causal
SNPs’. Learning about SNPs associated with a condition or trait therefore can give the
researcher clues as to which genes or regulatory DNA regions might be causally involved in the
development of that condition. Findings from association studies can therefore in some cases
contribute to the analysis of the condition itself.

The HapMap initially only looked for common variants (SNPs include by definition only
common variants). This was in line with the so-called common disease/common variant
(CD/CV) hypothesis formulated by Lander (1996); Cargill et al. (1999), and Chakravarti
(1999).[18] This hypothesis postulates, roughly, that common conditions are linked to genetic
variations that are common in a population.

This link between common variants and common diseases also explains why the HapMap
project could be promoted from the very beginning as the ‘next big thing’ after the sequence of
the human genome had been determined: it was with the haplotype map that genomics should
really start to have an impact on biomedical research and ultimately our understanding of
disease.[19]

3.1.3 The HapMap and its Critics

But the HapMap project was not without its critics; indeed the biologist David Botstein called it
a “magnificent failure” (cited in Hall 2010).[20] Some commentators, for instance, were worried
that the project is nothing more than a make-work project filling a gap that the finished HGP left
behind, and therefore a waste of precious funds (Couzin 2002). But more often, criticism of the
HapMap project was part of wider debates about the way post-HGP research should be
conducted. The HapMap project can therefore provide a useful window on some of the key
tendencies and disputes that marked (or marred) the post-HGP era.

One such indirect criticism of the HapMap derives from the apparent failure of GWAS to lead
researchers to clearer information about the links between our genetic makeup and the different
conditions to which our bodies can succumb. In the eyes of these critics the CD/CV hypothesis
was the key problem, as the common variants simply do not explain much of the heritability of
common diseases. This observation gave rise to the concept of ‘missing heritability’ (Eichler et
al. 2010).

The general focus on common variants in genomics was criticized by other authors who claimed
that the focus of geneticists should rather be on rare variants (McClellan & King 2010). These

11
rare variants, they claim, are where the missing heritability will be found. The problem with the
rare variants is that they cannot be picked up in GWAS that use SNP databases, as SNPs are by
definition common variants. Also, finding rare variants is a technical challenge as researchers
have to analyse the genomic data of a very large number of individuals to do so reliably. This
hunt for rare variants is a major reason behind the current push for the sequencing of millions
(rather than a couple of hundreds or thousands) of genomes. As discussed earlier, such large-
scale approaches have become feasible in recent years due to the reduced cost and increased
speed of next-generation DNA sequencing.

The current shift to whole-genome sequencing will also help to address another critique of the
GWAS/SNP/HapMap approach, namely its strict focus on single base pair changes in the
genome. Other changes in the genome, such as variations in the numbers of copies of repeated
elements or rearrangements, deletions or insertions of larger chunks of genomic DNA, might in
many cases be what is at the core of a disorder, necessitating (again) a shift in focus away from
point mutations and single genes to the genome as a whole (Lupski 1998, 2009).

As one of the first follow-ups to the original HGP, the HapMap project was a topic that often
came up in discussions of the legacy of the HGP. Such discussions became especially prominent
at the tenth anniversary of the publication of the draft genome sequence. In general, there was
an overwhelming sense of disappointment at what had come out of the HPG, at least in the
medical context. Given the grand promises that were made both around the start of the project in
the 1980s and then again in the year 2000 at the presentation at the White House,[21] it is not
surprising that people were unimpressed by what had been delivered by 2010/2011.
Interestingly, it was not only the usual suspects, such as Lewontin (2011), but also key
proponents of the HGP itself who were critical and pointed out the minimal medical advances
that had been achieved in the first post-HGP decade (Collins 2010; Venter 2010).

However, one thing that all critics, including the above-mentioned, agreed on was that even
though its effect on medical practice had been negligible, the HGP had transformed biological
research (see for instance Wade 2010; Varmus 2010; Hall 2010; Butler 2010; Green et al.
2011). One area in which genomic research had fundamentally changed both concepts and
practices was in the understanding of what a gene is and how gene expression works and is
regulated (Keller 2000; Moss 2003; Dupré 2005; Griffiths & Stotz 2006; Stotz et al. 2006;
Check 2010). With great foresight, Evelyn Fox Keller pointed out already in 2000 that the HGP
was interesting not so much because of the raw sequence it produced, but more because of the
transformations it brought about in our expectations when it comes to ‘genes’ and DNA (Keller
2000).

3.1.4 The HapMap, Genomics and Race

As mentioned above, HapMap’s use of samples from different populations brought the concept
of race into discussions of the project. Studies that looked into the genetic variation between
population groups (of which the HapMap was a key representative) are among several recent
developments (Duster 2015) that reignited discussion about a) the biological reality of race and
b) the question whether racial classifications should be used in biomedical research at all.
Several authors have picked up the relation between the HapMap project and a renewed concern
with race (see, e.g., Ossorio 2005; Duster 2005; Hamilton 2008). The question that dominates
these discussions is whether racial classifications reflect a ‘biological reality’.

Race has of course been an important topic in epidemiology and clinical research for a long time
(Witzig 1996; Stolley 1999), but it has been widely perceived as a socially constructed category
that has no biological basis.[22] And many researchers imagined that as the HGP demonstrated
how highly similar any two human beings are to each other at the DNA level, any idea of race
as serious biological concept would be disposed of once and for all (see, e.g., Gilbert 1992;

12
Venter 2000). But the concept of biological race was if anything rejuvenated rather than laid to
rest by the developments in genomics (Kaufman & Cooper 2001; Foster& Sharp 2002;
Hamilton 2008; Roberts 2011). This is exemplified by the fact that more and more scientists
have claimed in recent years that there is a biological basis to our traditional notions of race,
basing their claims on elaborate statistical analyses of data on genetic variation derived from a
large number of human DNA samples. These developments led for many to what Troy Duster
has called a ‘post-genomic surprise’ (Duster 2015).

An important point here is that linking genomics and race does not mean that researchers search
for, or even that there are, any ‘genes for race’, even if we consider the many different ways in
which this term can be interpreted (Dupré 2008). The discussion about the possible genetic basis
for race is now more subtle, as it is not simply concerned with the presence or absence of
specific genes or DNA elements and hence some sort of biological essence of races, but rather
with the variation in the frequencies of alleles in the population of interest (Gannett 2001,
2004). The question is therefore not whether DNA element X is absent or present in one
population or the other, but rather which variant of X is present at what frequency in a
population (in the context of the HapMap researchers will talk of SNP frequencies).

Data from population genetics shows that the global distribution of allele frequencies in the
human population is not discontinuous (Jorde& Wooding 2004; Feldman & Lewontin 2008) but
clinal, meaning that human DNA sequences vary in a gradual manner over geographic space
(Livingstone 1962; Serre & Pääbo 2004; Barbujani& Colonna 2010). Moreover, both genetic
and phenotypic traits display what is called ‘nonconcordant’ clinal variation, meaning that
different traits do not necessarily co-vary with each other; the pattern of how trait A varies
across geographic space might be very different from the pattern displayed by trait B
(Livingstone 1962; Goodman 2000; Jorde & Wooding 2004).

But despite these widely accepted findings, it is in the discussion of these distributions that the
idea of a biological basis for our traditional understanding of race classifications has re-
emerged. Based on the analysis of large sets of genetic variants in samples derived from various
locations around the globe, a number of researchers have made the claim that human genetic
variation displays geographical clustering (see, e.g., Rosenberg et al. 2002; Edwards 2003;
Burchard et al. 2003; Bamshad et al. 2003; Leroi 2005; Tang et al. 2005). Importantly, these
findings often also gave rise to, or were interpreted to support, the claim that this geographical
distribution matches our traditional racial classifications.

Such findings also led a number of authors to claim that race still has a valid place in
biomedical research: since these classifications are supposed to describe groups that are
internally genetically similar, but genetically different from other groups, they can serve as
useful proxies in estimating, for instance, the group member’s average risk of developing a
particular condition (see, e.g., Xie et al. 2001; Wood 2001; Risch et al. 2002; Rosenberg et al.
2002; Shiao et al. 2012). Some authors are more cautious and claim that race should only serve
as a loose and temporary proxy (Foster& Sharp 2002; Jorde & Wooding 2004) that should be
abandoned as soon as we know the actual genetic variations that are linked to a particular
condition or trait (Jorde & Wooding 2004; Leroi 2005; Dupré 2008). Such critics may note that
the most that these genetic studies show is that there is a correlation between a person’s genetic
variants and their geographical origin, if only because variants originate in a specific place; and
there is a loose relation between the socially constructed concept of race and geographic origin.
But given the tenuous connection that this generates between perceived or self-identified racial
categories and genetic constitution, race is a poor substitute for any actually salient genetic
information that may eventually be related to disease.

But there is also a significant group of researchers who are not convinced by these analyses and
who don’t think that there is any biological basis to the race concept (see, e.g., Schwartz 2001;

13
Duster 2005, 2006; Krieger 2000; Ossorio 2005). All of these authors criticise the above studies
and the geographic clusters of genetic variation they identify, mainly because of flaws in the
way samples are collected (see, e.g., Duster 2015) and how the data is ultimately analysed. The
latter criticism has mainly focused on the program ‘Structure’ that is used by a majority of the
studies mentioned above to churn out clusters of genetic variation (Bolnick 2008; Kalinowski,
2011; Fujimura et al. 2014). A telling criticism is that while Structure can be made to report that
there are five main geographical clusters that show distinctive allele frequencies and which
roughly match traditional notions of race (African, Asian, European, etc.), the programme can
equally be set up to report any arbitrarily selected number of genetically different groups, as the
user has to specify the number of clusters they are looking for before the Structure program is
applied to any actual dataset.

Two interesting aspects of these discussions are that they a) usually only deal with one way of
analyzing the biological reality of race classifications (as genetic) and b) adhere to a sharp
distinction between race as biological reality or as social construct. Regarding a) several
philosophers of biology have come up with alternative ways of thinking about a biological basis
for race (for instance race as clades (Andreasen 1998), inbred lines (Kitcher 1999), or ecotypes
(Pigliucci & Kaplan 2003)). This expansion of concepts brought with it the question of
classificatory monism vs. pluralism, i.e., the question whether there is one privileged way of
classifying race that somehow captures the ‘true nature’ of races (natural kinds) or whether there
are several ways of doing so, depending on theoretical or practical interests/context (Gannett
2010). As Gannet argues, however, this focus on the monism/pluralism debate and on natural
kinds comes at a cost, as it can mean that questions of practical significance are systematically
ignored (2010). Regarding b), Gannett points out that drawing a sharp distinction between race
as social construct or biological reality has not only been proven meaningless by recent work in
population genetics but can also mean that the much messier reality of human history and
diversity on this planet (and the complex interactions between scientific and social concepts of
race) is being overlooked, leading to an impoverished analysis of the problems at hand (Gannett
2010).

3.2 Metagenomics

Metagenomics (also referred to as ‘environmental’ or ‘community’ genomics) is a research field


that aims to analyse the collective genomes of microbial communities. These communities are
usually extracted from environmental samples, ranging from soil to water or even air samples.
A major advantage of metagenomics is that it does not rely on techniques for culturing
microbes. This is important because only an estimated 1%–5% of all microbes can be cultured
at all (Amann et al. 1995), an issue that has been referred to as the ‘great plate count anomaly’
(Staley & Konopka 1985).[23]

The term ‘metagenomics’ was first coined in 1998 (Handelsman et al. 1998). The prefix ‘meta’
in ‘metagenomics’ can be read in at least three different ways (O’Malley 2013): 1) As referring
to the fact that metagenomics transcends culturing limitations. 2) As emphasising the
aggregate-level approach to biology that characterises metagenomics (looking beyond single
entities (cells or genomes)). And 3) as referring to the goal of creating an overarching
understanding of the genomic diversity of the microbial realm.

The methodology of metagenomics can be described as a four step process, consisting of: 1) the
collection of environmental samples, 2) the isolation of microbial DNA from these samples, 3a)
the direct analysis of the DNA or 3b) the creation of a genomic DNA library by fragmentation
and insertion of the sampled DNA into suitable vectors (for instance plasmids that can be
propagated in laboratory bacterial strains). These genomic libraries can then be used to 4a)
sequence or 4b) perform a functional screen of the sampled genomic DNA. As the distinction
between steps 4a) and 4b) already implies, metagenomics can be divided into a sequence- and a

14
function-based approach (Gabor 2007; Sleator et al. 2008). In the former the collected DNA is
sequenced so that potential genes present in the sample can be identified and, if feasible, the
genomes of all the microbes that were present in the sample can be reconstituted.

The sequence-based approach is feasible due to the vastly reduced costs of sequencing and the
increased computing power available. The goal of the approach is to get an idea of the diversity
and distribution of microbes present in the sample and to also get an insight into their
functioning (for instance by identifying metabolism-related enzymes that can give clues about
the metabolic pathways active in the different microbes). This can give insights into the
workings of the microbial ecosystem present in the sampled environment more generally.

In the functional approach the fragments of DNA that are stored in the library are used in what
is often called a ‘functional screen’. To perform such a screen the researchers introduce the
library plasmids into specific bacterial strains which then read and express any protein-coding
sequence that might be present on the fragments, thereby producing the protein(s) the fragment
codes for.[24] The key to a functional screen is to create conditions in which only those bacteria
that express a protein with the function of interest can be singled out (for instance by making
sure that only those cells survive). Once the cells are singled out the library plasmid they
contain can be recovered and sequenced allowing the researcher to identify the protein(s)
encoded by that fragment. Functional metagenomics is often used to identify novel microbial
proteins that can be used in biotechnological and pharmaceutical contexts and it is not
surprising that metagenomics was and still is of great interest to the biotechnological sector
(Streit & Schmitz 2004; Lorenz & Eck 2005; Culligan et al. 2014; Ekkers et al. 2012).

One of the first actual (sequence-based) metagenomics projects was performed (yet again) by
one of the pioneers of genomics, Craig Venter. The goal of Venter and his team was to sample
microbes from the surface of the nutrient-poor Sargasso sea (Venter et al. 2004). This particular
environment was chosen for this pilot study because it was expected to have a microbial
community with relatively low diversity. This assumption turned out to be wrong and the
project identified more than a million putative protein-coding sequences derived from at least
1800 different genomic species extracted from the sea water.

Another early metagenomics study consisted of the analysis of an acidophilic biofilm with low
microbial diversity from an acid mine drain in California (Tyson et al. 2004). The analysed
biofilm survives in one of the most extreme environments including a very low pH (i.e., high
acidity), relatively high temperature and high concentration of metals. Importantly, this specific
biofilm truly displays low complexity as it is composed of only three bacterial and two archaeal
species. This simplicity greatly aided the analysis effort and allowed the researchers an almost
complete recovery of two of the genomes and a partial recovery of the other three.

There have been many other metagenomics studies conducted since and there is little point in
listing them here, as the list is growing by the month. One aspect of the ongoing research that is
important to point out, however, is that the projects are becoming increasingly ambitious. The
trend now is not just to have an integrated view on the genomes but to combine metagenomics
with other techniques such as metabolomics (the assay of small molecules present in a system),
metatranscriptomics (the analysis of all RNA transcripts of a community of microbes) and
viromics (the analysis of all the viral genomes present in the system of interest) (see Turnbaugh
& Gordon 2008; Bikel et al. 2015). In a sense the field is moving towards a highly integrated
meta-Metagenomics approach (Dupré & O’Malley (2007) talk of “metaorganismal
metagenomics”). This is also in line with the general trend towards big-data and discovery-
based approaches in the life sciences (Ankeny & Leonelli 2015; Dolinski & Troyanskaya 2015;
Leonelli 2014, 2016).

15
The rise of metagenomics is also linked to other changes in biological sciences more generally,
especially the rise of systems biology starting around the year 2000 (which is itself closely
linked to the development of genomics since the 1990s). O’Malley and Dupré (2005) point out
that there is an important distinction to be made when looking at fields like systems biology,
because there is not only a change in epistemology but also one in ontology. They therefore
distinguish between pragmatic and systems-theoretic biologists. For the former, the idea of a
‘system’ is merely an epistemic tool. For the latter, the system becomes the new fundamental
ontological unit. Doolittle and Zhaxybayeva (2010) claim that the same can be seen in
metagenomics where there is a drive to see the community or the ecosystem as the new
fundamental unit, and not the single species (see also Dupré & O’Malley 2007).

Moving away from a focus on single organisms or monogenomic species allows us to make
better sense of many recent findings in microbiology (in which metagenomics has played a key
role). Central to all of this are mobile DNA elements that can travel horizontally, meaning
between different members of a community (including between different kinds of organisms).
Obtaining such mobile DNA elements can have a crucial effect on the survival and reproduction
capacity of the recipient cell. Mobile DNA can therefore be a key element in the evolutionary
processes as it becomes a ‘communal resource’ (McFall-Ngai et al. 2013). Acquired antibiotic
resistance is only one of many benefits cells are known to obtain through acquired DNA
elements.

It is then the composition of functional elements that the community as a whole contains which
is preserved over evolutionary time. And the community could be seen as an assembly of
biochemical activities and not of distinct microbial lineages (see for instance Turnbaugh et al.
2009 and also Burke et al. 2011). The metagenome then becomes a ‘genome of communities’
and not a ‘community of genomes’ (Doolittle & Zhaxybayeva 2010). All of this also feeds into
the more general, and currently very active, discussion about the problem of individuality in
biology (Clarke 2010; Bouchard& Huneman 2013; Ereshefsky & Pedroso 2013; Guay & Pradeu
2015; SEP entry on the biological notion of individual).

Apart from these issues in biological ontology, there are also epistemological issues raised by
metagenomics, namely the discrepancy between our ability to sequence DNA and to interpret it.
These discussions about the challenges of DNA sequence interpretation are not just a problem
for (meta)genomics and other -omics approaches, but also for biomedicine more generally and
its push towards a truly personalised medicine. A key issue for this push is the discrepancy
between the (ever-decreasing) costs of obtaining a personal genome sequence (Bennett et al.
2005; Mardis 2006; Check 2014a,b) and the high costs of making sure the data can be
appropriately interpreted (Mardis 2006; Sboner et al. 2011; Phillips et al. 2015). This problem is
related to the so-called ‘bioinformatic bottleneck’, the handling and the interpretation of the
large amounts of sequence data that provides the main obstacle to progress (Green et al. 2011;
Desai et al. 2012; Scholz et al. 2012; Marx 2013). In the days of next-generation sequencing the
sequencing step itself is no longer the rate-limiting step.

4. Outlook

Genomics is now an integral part of all of the life sciences. Not that every life scientist is now a
genomicist—there are still researchers who focus on the biochemistry, development, or the
molecular networks of human cells and other organisms. But the DNA sequences of the human
genome and the numerous model organisms that came out of the HGP enter every laboratory, if
not on a daily basis than at least at some stage of every research project. The same applies to the
maps of genetic variation that were discussed in Section 3 and to the (somewhat controversial)
data on functional DNA elements that the ENCODE project generated (see the supplementary
document The ENCODE Project and the ENCODE Controversy).

16
And it is not just the quantity of data and the many new “-omes” that researchers now work with
that have transformed the science. As we have pointed out in several places, insights into the
genome and its functioning have transformed researchers’ understanding of the entities and
processes they are working with in the course of the last few decades. Part of this was also a
transformation in our understanding of what it means to do ‘good’ science. What the HPG and
its various offshoots have achieved, therefore, is to change the life sciences at the
epistemological, the ontological and also the methodological level.

As so often, an interesting and even pressing question is where all of this is going. Predicting
the future might not be possible, but there are trends that can be identified and which can be
expected to follow a similar trajectory in the near future. One such trend is the drive for big
data. ‘Big’ here refers not only to the quantity but also to the different types of data collected. A
derivative of this big-data drive is the goal to integrate all of the diverse data and mould it into
models that can further our understanding of biological systems and the prediction of their
behaviour. The relatively young discipline of systems biology, which could not be discussed in
detail in this entry, will certainly play a key role in this endeavour.

Supplement

The ENCODE Project and the ENCODE Controversy

The ENCyclopedia Of DNA Elements (ENCODE) project was an international research effort
funded by the National Human Genome Research Institute (NHGRI) that aimed to identify all
functional elements (FE) in the human genome (ENCODE Project Consortium 2004). FEs
include, for instance, protein-coding regions, regulatory elements such as promoters, silencers
or enhancers and sequences that are important for chromosomal structure. The project, which
began in 2003 and included 442 researchers during its main production phase, came to a
conclusion in 2012 with the publication of 30 different papers in different journals (ENCODE
Project Consortium 2012; Pennisi 2012). Similarly to the HapMap project, ENCODE was
presented as the logical next step after the sequencing of the genomic DNA, since tackling the
interpretation of the sequences was now seen as the top priority (ENCODE Project Consortium
2004).

The ENCODE project incited a heated debate in academic journals, the blogosphere and also in
the national and international press. The crucial claim that incited much ire was the project’s
conclusion that 80.4% of the human genomic DNA has a ‘biochemical function’ (ENCODE
Project Consortium 2012). To understand the strong reaction this statement provoked we have
to turn our focus again to the C-value paradox and the concept of ‘junk DNA’ (see Section 2.3
of the main text). In the context of the ENCODE controversy this debate was linked with the
issue of how to define a ‘functional element’ and how scientists ascribe functions in biological
systems. What the ENCODE research implied, at least in the eyes of some commentators, was
that the idea of junk DNA was proven wrong, because almost all of our DNA turned out to be
functional. This led to claims that textbooks will have to be re-written, as they still describe the
genome as mainly composed of junk.[S1] The defenders of the old view claimed that the
ENCODE researchers set far too low a bar in ascribing functions to elements of biological
systems.

The Methodology of the ENCODE Project

The ENCODE project used a range of different experimental assays to analyse what they
referred to as ‘sites of biochemical activity’ (for an overview of the ENCODE output see Qu &
Fang 2013). These are sites at which some sort of modification can be identified (for instance
methylation) or to which an activity (such as transcription of DNA to RNA) can be ascribed.

17
These modifications or activities were taken as strong indications that the identified regions of
the genomic DNA play a functional role in human cells.

As an example of how this approach worked, ENCODE researchers were interested in finding
out how much of the genomic DNA is involved in the regulation of gene expression.
Researchers postulate that a key hallmark of all regulatory DNA elements is their accessibility.
This makes sense as the regulatory and transcriptional machinery need access to these DNA
sites. ENCODE used this feature of regulatory DNA to map (putative) regulatory elements in
the human genome. One way to do so is to perform what is called a ‘DNase I hypersensitivity
assay’. DNase I is a protein that can cut DNA and this cutting process works better when the
template DNA is accessible, meaning that highly accessible regions are more sensitive to DNase
I activity. The behaviour of the genome in the DNase I hypersensitivity assay can therefore be
used to learn indirectly about its structure, from which researchers then infer the presence of a
functional element (in this case a regulatory sequence). This is just one example of about 24
different types of assays that ENCODE researchers used to get a better insight into the number
and distribution of functional elements in the human genome (for a discussion of the different
types of experimental approaches used in ENCODE see Kellis et al. 2014).

What is interesting about most of these assays is that they look at a proxy for function: if a
stretch of DNA is hypersensitive to DNase I then it is automatically defined as functional.
Another example is DNA transcription itself. If a DNA sequence shows up in RNA sequencing
then this means it has been transcribed into RNA by the enzyme RNA polymerase. This
activity, in the eyes of the ENCODE researchers at least, makes the DNA element in question a
functional element of the genome.

But such a broad approach to finding out about functional elements is highly problematic, as a
transcription event or hypersensitivity can be present for many different reasons (for instance as
a result of transcriptional noise). This is exactly what some critics of ENCODE homed in on,
pointing out that merely showing the existence of a structure (such as methylation) or a process
(such as transcription) is not enough by itself to prove any functional significance of these
biochemical features (Doolittle 2013; Eddy 2012; Graur et al. 2013; Niu & Jiang 2013).

Whilst this is surely a valid point that applies to a large part of the research done within
ENCODE, not all studies performed as part of the project looked at such proxies. An example is
(Whitfield et al. 2012), who did not just look at specific modifications or behaviour of DNA in
particular assays but mutated specific sites to check whether the interference with these sites has
an effect on gene expression.

The above argument about how we learn about functional elements presumes that we already
have an understanding of what it means to be ‘functional’. But it is by no means clear how
biological function should be defined and there are competing accounts of what it is that makes
an entity functional. These discussions about the concept of a biological function were central to
the dispute surrounding the ENCODE project.

The ENCODE Controversy

Especially in the early critiques by Doolittle (2013) and Graur et al. (2013) the distinction
between ‘selected effect’ (SE) function and the causal role (CR) function of an entity or process
figured prominently. The ENCODE project, so the critics, simply ignored key work by
philosophers and theoretical biologists on this topic, thereby making a complete muddle of what
they are talking about when they use the term ‘function’. With more conceptual clarity, they
argued, the claim that 80% of our DNA is functional would not be tenable and the established
notion of ‘junk DNA’ would be saved.

18
The definition of function and functional analysis in biology deserves an SEP entry of its own.
Here we will limit ourselves to a few comments on this issue that relate to the ENCODE
controversy specifically. The key point is that SE functions are functions that are assigned to
conserved sequences. The SE account aims to answer the question of why an element is there: a
functional element according to this definition is an entity whose presence has a positive effect
on the survival or reproduction of the organism; meaning the entity has been selected for
(Millikan 1984, 1989a; Neander 1991; Griffiths 1992, 1993; Godfrey-Smith 1994). If a gene has
been selected for it is expected that its sequence will be conserved: mutations within it will be
selected against, and will be less frequent than in sequences that are not being maintained by
selection. History matters for this account, which is also why it is sometimes referred to as the
etiological account of function, going back to a paper by Wright (1976) but see Millikan
(1989a) on how the etiological account relates to Wright’s original account).

CR functions on the other hand do not depend on the history of the system. What the CR
account answers are ‘how’ questions in relation to the capacities of a system (Millikan 1989b).
It is only the here and now that matters for the CR account: functional analysis is about
analysing a system with capacity C into sub-capacities that are attributed to elements of the
system and which contribute to C (Cummins 1975). The CR account is in an important sense
more liberal than the etiological account: according to CR anything can be deemed a functional
element as long as it is part of a system and plays some causal role contributing to some system
capacity we happen to be interested in.

Graur et al. (2013) claim that ENCODE worked with the CR account but that this is a mistake,
as biologists actually work with the SE account, a claim that can also be found in Doolittle
2013. They acknowledge that biologists might study CR functions (for instance when doing
deletion experiments) but claim that even if scientists do so they take these causal roles simply
as indicative of SE function, which is the ‘true’ function of a biological element (Doolittle 2013;
Graur et al. 2013). It is with this focus on SE functions that these critics bring us back to the C-
value paradox and the strong case one can make for the importance of the junk DNA concept.

The deep problem is that there is simply not enough conserved DNA in humans to match the
high percentage of functional DNA the ENCODE project came up with. Accepting current
estimates that between 5 and 10% of the human genome is conserved (Lindblad-Toh et al. 2011;
Ward& Kellis 2012) then there is clearly no correlation between the amount of sequence under
evolutionary constraint and what is called ‘functional’ by the ENCODE consortium. Graur et al.
(2013) call this the ‘ENCODE incongruity’.[S2]

As already pointed out above this critique is based on a claim about which functional account is
actually used by scientists. This appears to be taken as an empirical claim, though perhaps what
matters more is how scientists ought to understand functional language. This, in turn, is likely to
depend on what their aims are. Either way, this is an important point, because once we think in
terms of SE functions DNA conservation immediately becomes salient. If, however, it turns out
that scientists don’t (or shouldn’t) use the SE account (as is claimed, for instance, by Elliott et
al. 2014; Germain et al. 2014; Amundson & Lauder 1994; Griffiths 1994, 2006) this critique
loses much of its force as the ENCODE incongruity is no longer a problem.

This is exactly the point on which a recently published critique of the critics picks up. (Germain
et al. 2014) claim that the critics of ENCODE simply misunderstood the nature of the project, as
they did not take into account that the ENCODE project was part of a biomedical discovery
process. As such the project was concerned to find out about elements of the human genome
that might engage in relevant biochemical processes. What makes a sequence or activity
relevant in the biomedical context it is not whether it is conserved but whether its absence or
presence has a potential effect on activities or entities that are of relevance to biomedical

19
research. The CR account, Germain and co-workers claim, is therefore the right account to use
in this context and the ‘ENCODE incongruity’ is no longer a relevant issue.

In all of this the ENCODE researchers themselves did not stay silent. It is interesting to note
that in a reply to their critics key ENCODE members toned down their claims about the
percentage of functional elements present in the human genome – the 80.4% number is not
mentioned again (Kellis et al. 2014). In fact, no numbers are mentioned in this paper and the
authors remark that in their opinion creating an open access resource (i.e., the ENCODE library)
is “far more important than any interim estimate of the fraction of the human genome that is
functional” (2014: 6136). The authors also point out that in their eyes all experimental and
theoretical approaches to functional ascriptions have their limitations and that no account or
assay will get it right on their own, which is why they advocate both a methodological and
theoretical pluralism, again defusing many of the stronger claims made earlier on both sides of
the dispute.

The issue (Germain et al. 2014) raise concerns the type of scientific project ENCODE is. As
also in the context of the HapMap project, we encounter the long-standing dispute about the
value of hypothesis-free or exploratory research (see Section 3.1.3 of the main text). Eddy
(2013), for instance, laments that the project was originally a mapping project but was then spun
retrospectively as a project that aimed at testing a hypothesis. Graur et al. (2013) also make the
point that ENCODE overstepped their remit of a big science project - which, they claim, is
simply to provide data – and that the ENCODE researchers ventured into ‘small science’
territory by trying to deliver an interpretation of that data. In contrast to the criticisms the HGP
originally encountered, these modern-day critics don’t have a problem anymore with the idea of
a descriptive mapping project; their worry is rather that the project is sold as something it isn’t.

Notes

1. DNA methylation refers to the process in which a methyl group (-CH3) is covalently added to
another molecule (in this case DNA). The importance of this process will be discussed further
below.

2. A minor problem that we shall not discuss is that some viruses do not contain DNA at all, but
use RNA to serve their basic genetic functions. This is a fairly technical problem and not too
hard to resolve with a qualification to the definition. We should note, however, that it is not a
problem at all for the account of the genome that we favour (below).

3. Though perhaps this is not an indefensible move. A more radical way to approach this issue is
to abandon the idea of ‘the’ genome of an organism and treat both the organism and its genome
as complex symbiotic wholes, “holobionts”, polygenomic entities with multiple and diverse
genomes (Dupré 2010; reprinted as Dupré 2012: ch. 7).

4. Non-coding DNA is defined as DNA that does not code for proteins. It is estimated that only
about 1.5% of human chromosomal DNA codes for proteins (International Human Genome
Sequencing Consortium 2001, 2004). The ENCODE project (see the Supplement) and other
recent research on DNA transcription has shown that a large part of chromosomal DNA (up to
74%) is transcribed into RNA, the majority of which does not code for proteins (which is why it
is also called non-coding RNA (ncRNA)), but a substantial though debated proportion of which
is known to serve some function. For an overview of types of ncRNA see Wright and Bruford
2011. It is estimated by some that there could be as many or more genes for non-coding RNA
than for coding RNA, bringing up the gene count from 20,000 to at least 40,000 or more
(Harrow et al. 2012; Clark et al. 2013).

20
5. Keller (2011) identifies four different definitions of the term: ‘genome-1’ is defined as the set
of genes, ‘genome-2’ as the set of chromosomes in a cell. ‘Genome-3’ is defined as the
organism’s DNA, whereas ‘genome-4’ refers to all the genetic material of an organism.

6. For further discussion of the broader processual view of biology of which this view of the
genome is a vital part, see Dupré 2012; Nicholson and Dupré forthcoming.

7. We shall not address here the fascinating history of sequencing technologies, from the
laborious manual techniques of the 70s and 80s, to the machines that today can provide
gigabytes of sequence data in a few hours. Note that as crucial as speed is accuracy, and in
actual sequencing practice a large number of runs is required to reduce errors.

8. These discussions about the different modes of the experimental sciences (exploratory vs.
hypothesis- or theory-driven experimentation) also became an important topic in philosophy of
science (see, e.g., Steinle 1997; Burian 1997; Franklin 2005; Elliott 2007; O’Malley 2007;
Waters 2007a; Karaca 2013). Philosophers of science also tackled the important issue of how
these different modes of research are integrated with each other (Burian 2007; O’Malley et al.
2010; O’Malley & Soyer 2012).

9. Fittingly, the first genome of a single person to be published was that of James D. Watson
himself, whose sequence was released online in 2007 and published in an academic journal in
2008 (Wheeler et al. 2008). The online release of Watson’s genome was quickly followed by
the publication of Craig Venter’s genome (Levy et al. 2007).

10. By enlarging the range of sequenced genomes the HGP also had a crucial impact on what is
now called ‘comparative genomics’. The power of this discipline lies in the insights it can give
us into the relations between different organisms or species at the genetic level (Touchman
2010). The comparison can take place at different resolutions (overall genome size, number of
chromosomes, genes or, most importantly perhaps, nucleotide-by-nucleotide alignments of
different sequences). The field has been of crucial importance to phylogenetics and has also
attracted the attention of philosophers (see, e.g., Moss 2006; Piotrowska 2009; Perini 2011).
Even though, due to space restrictions, we could not include a section dedicated to comparative
genomics, we will discuss crucial aspects of this practice throughout this entry, as its tools and
methods play a key role in many of the projects we touch upon (such as the HapMap discussed
in Section 3.1 or the ENCODE project discussed in a supplement).

11. Note that we can find the term ‘junk DNA’ earlier, for instance in a paper by Ehret and De
Haller (1963), where the authors mention that

[w]hile current evidence makes plausible the idea that all genetic material is DNA (with the
possible exception of RNA viruses), it does not follow that all DNA is competent genetic
material (viz. “junk” DNA) […]. (Ehret & De Haller 1963: 39)

See Graur 2013, for more on the history of the term ‘junk DNA’.

12. Note how assumption (a) neglects the key examples that gave rise to the C-value paradox.

13. As Evelyn Fox Keller (2011) points out, this concept of junk DNA became a staple of
genomic thinking in the 1980s, in particular through the publication of two papers (Doolittle &
Sapienza 1980; Orgel & Crick 1980) that linked Richard Dawkins’ (1976) concept of selfish
DNA to Ohno’s notion of junk DNA.

14. The official NCBI website containing the HapMap resource data was decommissioned in
June 2016 due to a security issue that was uncovered in a computer security audit (see the link

21
to ‘Decomissioned NCBI HapMap Resource’ in the Other Internet Resources section). The
NCBI justifies the decision to decommission the resource by a decline in usage that could be
observed over recent years. This has sparked a conversation amongst scientists who believe in
the continuing importance of the HapMap resource. Parts of these discussions can be found on
Twitter under the hashtag #saveHapMap.

15. The term ‘allele’ is often used to refer to different variants of a gene but the term is now
used more liberally and can simply refer to variants of any locus on chromosomal DNA. See for
instance talking glossary: allele or scitable: allele under OIR.

16. Note that 1% is an arbitrary value. Some authors, for instance, set the threshold at 5%.

17. Meiosis is the special type of cell division that happens only in an organism’s reproductive
cells (gametes). During meiosis homologous chromosomes (i.e., the maternal and paternal
version of each chromosome) become fragmented and are then rejoined in a process called
‘crossing-over’. This process leads to the (homologous) recombination of parts of the two
chromosomes. If a crossing over happens between two different loci on a chromosome the
alleles at these two loci become separated. If the loci are closer to each other then the likelihood
of a crossing over event taking place between them is smaller, leading to a higher association
level between the two alleles.

18. For a short history of the CD/CV hypothesis see Box 2 in Visscher et al. 2012.

19. Francis Collins was quoted as saying that the HapMap will be “the single most important
genomic resource for understanding human disease, after the sequence” (Couzin 2002).

20. Hall (2010) also cites other prominent researchers, such as Walter Bodmer or David
Goldstein, who were critical of the HapMap project.

21. So US president Bill Clinton talking on June 26, 2000 at the presentation of the draft
genome at the White House:

With this profound new knowledge, humankind is on the verge of gaining immense, new power
to heal. Genome science will have a real impact on all our lives—and even more, on the lives of
our children. It will revolutionize the diagnosis, prevention and treatment of most, if not all,
human diseases. (White House 2000)

22. For an overview and discussion of the literature on race (and ethnic categories) as socially
and historically constructed see Fujimura et al. 2014: 209. Lisa Gannett claims that the concept
of race has never really been abandoned in biology but was rather transformed from a
typological into a population-based understanding of race (Gannett 2001). For more on the
history of the race concept see Stolley (1999), Marks (2008), Yudell (2011), and the SEP entry
on race.

23. The anomaly consists of the discrepancy between the proportion of living microbes that can
be detected in the original sample using a microscope and the proportion of microbes of that
same sample that then grow on an agar plate, the key means of culturing microbes.

24. This step is usually performed in the bacterium E. coli. See Streit and Schmitz 2004,
National Research Council 2007, and Ekkers et al. 2012 for a discussion of the methodological
difficulties such an approach can encounter.

22
Notes to the Supplement

S1. Interestingly, the term ‘junk DNA’ does not appear in any of the main consortium
publications and there are no claims made about the ‘death’ of the junk DNA concept (see
ENCODE Project Consortium 2007, 2012). But as Germain et al. (2014) point out, we find the
term in commentaries such as Ecker et al. 2012, Pennisi 2012 with Pennisi in particular
claiming that the ENCODE project has written a ‘eulogy’ for junk DNA. It is also these
editorials (or articles written in newspapers such as the New York Times) to which Doolittle
(2013) turned in his critique of the ENCODE project and its claims about the demise of the junk
DNA concept. But see Eddy (2013) for the claim that some of the ENCODE leaders themselves
spun the project towards the textbooks-are-wrong narrative in their attempts to popularize the
project.

S2. It has to be noted here that this is in fact something the ENCODE researchers already noted
in the publication reporting on the pilot phase of the project (ENCODE Project Consortium
2007). The fact that their assays picked up so much more biochemical activity than what would
be expected if we only look at conserved regions of the genome was something they repeatedly
highlighted and also tried to explain (ENCODE Project Consortium 2007).

Bibliography

• Adams, M.D., et al., 2000, “The Genome Sequence of Drosophila melanogaster”,


Science, 287(5461): 2185–2195. pmid:10731132
• Amann, Rudolf I., Wolfgang Ludwig, and Karl-Heinz Schleifer, 1995, “Phylogenetic
identification and in situ detection of individual microbial cells without cultivation”,
Microbiological Reviews, 59(1): 143–169. pmcid:PMC239358
• Amundson, Ron and George V. Lauder, 1994, “Function Without Purpose: The Uses of
Causal Role Function in Evolutionary Biology”, Biology and Philosophy, 9(4): 443–469.
doi:10.1007/BF00850375
• Andreasen, Robin O., 1998, “A New Perspective on the Race Debate”, The British
Journal for the Philosophy of Science, 49(2): 199–225. doi:10.1093/bjps/49.2.199
• Ankeny, Rachel A., 2001, “Model organisms as models: understanding the ‘Lingua
Franca’ of the human genome project”, Philosophy of Science, 68(3): S251–S261.
• Ankeny, Rachel A. and Sabina Leonelli, 2015, “Valuing Data in Postgenomic Biology:
How Data Donation and Curation Practices Challenge the Scientific Publication System”, in
Postgenomics: Perspectives on Biology After the Genome, S.S. Richardson, and H. Stevens
(eds.), Chapel Hill, NC: Duke University Press, pp. 126–149.
• Arabidopsis Genome Initiative, 2000, “Analysis of the Genome Sequence of the
Flowering Plant Arabidopsis thaliana”, Nature, 408(6814): 796–815. doi:10.1038/35048692
• Baker, Monya, 2013, “Big Biology: The ‘omes Puzzle”, Nature, 494(7438): 416–419.
doi:10.1038/494416a
• Bamshad, Michael J., Stephen Wooding, W. Scott Watkins, Christopher T. Ostler, Mark
A. Batzer, and Lynn B. Jorde, 2003, “Human Population Genetic Structure and Inference of
Group Membership”, The American Journal of Human Genetics, 72(3): 578–589.
doi:10.1086/368061
• Barbujani, Guido and Vincenza Colonna, 2010, “Human Genome Diversity: Frequently
Asked Questions”, Trends in Genetics, 26(7): 285–295. doi:10.1016/j.tig.2010.04.002
• Barnes, Barry and John Dupré, 2008, Genomes and What to Make of Them, Chicago:
University of Chicago Press.
• Bennett, Simon T., Colin Barnes, Anthony Cox, Lisa Davies, and Clive Brown, 2005,
“Toward the $1000 Human Genome”, Pharmacogenomics, 6(4): 373–382.
doi:10.1517/14622416.6.4.373

23
• Bickmore, Wendy A. and Bas van Steensel, 2013, “Genome Architecture: Domain
Organization of Interphase Chromosomes”, Cell, 152(6): 1270–1284.
doi:10.1016/j.cell.2013.02.001
• Bikel, Shirley, Alejandra Valdez-Lara, Fernanda Cornejo-Granados, Karina Rico,
Samuel Canizales-Quinteros, Xavier Soberón, Luis Del Pozo-Yauner, and Adrián Ochoa-Leyva,
2015, “Combining Metagenomics, Metatranscriptomics and Viromics to Explore Novel
Microbial Interactions: Towards a Systems-level Understanding of Human Microbiome”,
Computational and Structural Biotechnology Journal, 13: 390–401.
doi:10.1016/j.csbj.2015.06.001
• Blattner, Frederick R., et al., 1997, “The Complete Genome Sequence of Escherichia
coli K-12”, Science, 277(5331): 1453–1462. doi:10.1126/science.277.5331.1453
• Bolnick, Deborah A., 2008, “Individual Ancestry Inference and the Reification of Race
as a Biological Phenomenon”, in Koenig et al. 2008: 70–85.
• Bouchard, Frédéric and Philippe Huneman (eds.), 2013, From Groups to Individuals:
Evolution and Emerging Individuality, Cambridge, MA: MIT Press.
• Brenner, Sidney, 1990, “The Human Genome: The Nature of the Enterprise”, Human
Genetic Information: Science, Law and Ethics, 149: 6–12. doi:10.1002/9780470513903.ch2
• Burchard, Esteban González, Elad Ziv, Natasha Coyle, Scarlett Lin Gomez, Hua Tang,
Andrew J. Karter, Joanna L. Mountain, Eliseo J. Pérez-Stable, Dean Sheppard, and Neil Risch,
2003, “The Importance of Race and Ethnic Background in Biomedical Research and Clinical
Practice”, New England Journal of Medicine, 348(12): 1170–1175.
doi:10.1056/NEJMsb025007
• Burian, Richard M., 1997, “Exploratory Experimentation and the Role of Histochemical
Techniques in the Work of Jean Brachet, 1938–1952”, History and Philosophy of the Life
Sciences, 19(1): 27–45.
• –––, 2007, “On MicroRNA and the Need for Exploratory Experimentation in Post-
Genomic Molecular Biology”, History and Philosophy of the Life Sciences, 29(3): 285–312.
pmid:18822659
• Burke, Catherine, Peter Steinberg, Doug Rusch, Staffan Kjelleberg, and Torsten
Thomas, 2011, “Bacterial Community Assembly Based on Functional Genes Rather Than
Species”, Proceedings of the National Academy of Sciences, 108(34): 14288–14293.
• Bustin, Michael and Tom Misteli, 2016, “Nongenetic Functions of the Genome”,
Science, 352(6286): 671, aad6933 (7 pages). doi:10.1126/science.aad6933
• Butler, Declan, 2010, “Human Genome at Ten: Science After the Sequence”, Nature,
465(7301): 1000–1001. doi:10.1038/4651000a
• C. elegans Sequencing Consortium, 1998, “Genome Sequence of the Nematode C.
elegans: A Platform for Investigating Biology”, Science, 282(5396): 2012–2018.
doi:10.1126/science.282.5396.2012
• Cardon, Lon R. and John I. Bell, 2001, “Association Study Designs for Complex
Diseases”, Nature Reviews Genetics, 2(2): 91–99. doi:10.1038/35052543
• Cargill, Michele, et al., 1999, “Characterization of Single-nucleotide Polymorphisms in
Coding Regions of Human Genes”, Nature Genetics 22(3): 231–238. doi:10.1038/10290
• Chakravarti, Aravinda, 1999, “Population Genetics—Making Sense Out of Sequence”,
Nature Genetics, 21(1 Suppl): 56–60. doi:10.1038/4482
• Check Hayden, Erica, 2010, “Human Genome at Ten: Life is Complicated”, Nature,
464(7289): 664–667. doi:10.1038/464664a
• –––, 2014a, “Is the $1,000 Genome for Real?” Nature News, (15 January 2014)
doi:10.1038/nature.2014.14530.
• –––, 2014b, “The $1,000 Genome”, Nature, 507(7492): 294–5. doi:10.1038/507294a
• Clark, Michael B., Anupma Choudhary, Martin A. Smith, Ryan J. Taft, and John S.
Mattick, 2013, “The Dark Matter Rises: the Expanding World of Regulatory RNAs”, Essays in
Biochemistry, 54: 1–16. doi:10.1042/bse0540001
• Clarke, Ellen, 2010, “The Problem of Biological Individuality”, Biological Theory,
5(4): 312–325. doi:10.1162/BIOT_a_00068

24
• Collins, Francis S., 2010, “Has the Revolution Arrived?” Nature, 464(7289): 674–675.
doi:10.1038/464674a
• Collins, Francis S. and Harold Varmus, 2015, “A New Initiative on Precision
Medicine”, New England Journal of Medicine, 372(9): 793–795. doi:10.1056/NEJMp1500523
• Couzin, Jennifer, 2002, “New Mapping Project Splits the Community”, Science,
296(5572): 1391. doi:10.1126/science.296.5572.1391
• Crollius, Hugues Roest, et al., 2000, “Estimate of Human Gene Number Provided by
Genome-wide Analysis Using Tetraodon nigroviridis DNA Sequence”, Nature Genetics, 25(2):
235–238. doi:10.1038/76118
• Culligan, Earmon P., Roy D. Sleator, Julian R. Marchesi, and Colin Hill, 2014,
“Metagenomics and Novel Gene Discovery: Promise and Potential for Novel Therapeutics”,
Virulence, 5(3): 399–412. doi:10.4161/viru.27208
• Cummins, Robert, 1975, “Functional Analysis”, The Journal of Philosophy, 72(20):
741–765. doi:10.2307/2024640
• Cutter, Amber R. and Jeffrey J. Hayes, 2015, “A Brief Review of Nucleosome
Structure”, FEBS Letters, 589(20): 2914–2922. doi:10.1016/j.febslet.2015.05.016
• Daly, Mark J., John D. Rioux, Stephen F. Schaffner, Thomas J. Hudson, and Eric S.
Lander, 2001, “High-resolution Haplotype Structure in the Human Genome”, Nature Genetics,
29(2): 229–232. doi:10.1038/ng1001-229
• Dawkins, Richard, 1976, The Selfish Gene, Oxford: Oxford University Press.
• Desai, Narayan, Dion Antonopoulos, Jack A. Gilbert, Elizabeth M. Glass, and Folker
Meyer, 2012, “From Genomics to Metagenomics”, Current Opinion in Biotechnology, 23(1):
72–76.
• Dolinski, Kara and Olga G. Troyanskaya, 2015, “Implications of Big Data for Cell
Biology”, Molecular Biology of the Cell, 26(14): 2575–2578. doi:10.1091/mbc.E13-12-0756
• Doolittle, W. Ford, 2013, “Is Junk DNA Bunk? A Critique of ENCODE”, Proceedings
of the National Academy of Sciences, 110(14): 5294–5300. doi:10.1073/pnas.1221376110
• Doolittle, W. Ford and Carmen Sapienza, 1980, “Selfish Genes, the Phenotype
Paradigm and Genome Evolution”, Nature, 284(5757): 601–603. doi:10.1038/284601a0
• Doolittle, W. Ford and Olga Zhaxybayeva, 2010, “Metagenomics and the Units of
Biological Organization”, Bioscience, 60(2): 102–112. doi:10.1525/bio.2010.60.2.5
• Dulbecco, Renato, 1986, “A Turning Point in Cancer Research: Sequencing the Human
Genome”, Science, 231(4742): 1055–1056. doi:10.1126/science.3945817
• Dupré, John, 2005, “Are There Genes?” Royal Institute of Philosophy Supplement, 56:
193–211. doi:10.1017/S1358246105056092
• –––, 2008, “What Genes Are, and Why There Are No ‘Genes for Race’”, in Revisiting
Race in a Genomic Age, Barbara A. Koenig, Sandra Soo-Jin Lee and Sarah S. Richardson
(eds.), New Brunswick, N.J.: Rutgers University Press, 2008 pp. 39–55.
• –––, 2010, “The Polygenomic Organism”, The Sociological Review, 58(s1): 19–31.
doi:10.1111/j.1467-954X.2010.01909.x
• –––, 2012, Processes of Life: Essays in the Philosophy of Biology, Oxford: Oxford
University Press.
• Dupré, John and Maureen A. O’Malley, 2007, “Metagenomics and Biological
Ontology”, Studies in History and Philosophy of Science Part C: Studies in History and
Philosophy of Biological and Biomedical Sciences, 38(4): 834–846.
doi:10.1016/j.shpsc.2007.09.001
• Duster, Troy, 2005, “Race and Reification in Science”, Science, 307(5712): 1050–1051.
doi:10.1126/science.1110303
• –––, 2006, “The Molecular Reinscription of Race: Unanticipated Issues in
Biotechnology and Forensic Science”, Patterns of Prejudice, 40(4–5): 427–441.
• –––, 2015, “A Post-genomic Surprise. the Molecular Reinscription of Race in Science,
Law and Medicine”, The British Journal of Sociology, 66(1): 1–27. doi:10.1111/1468-
4446.12118

25
• Ecker, Joseph R., Wendy A. Bickmore, Inês Barroso, Jonathan K. Pritchard, Yoav
Gilad, and Eran Segal, 2012, “Genomics: ENCODE Explained”, Nature, 489(7414): 52–55.
doi:10.1038/489052a
• Eddy, Sean R., 2012, “The C-value Paradox, Junk DNA and ENCODE”, Current
Biology, 22(21): R898–R899. doi:10.1016/j.cub.2012.10.002
• –––, 2013, “The ENCODE Project: Missteps Overshadowing a Success”, Current
Biology, 23(7): R259–R261.
• Edwards, A.W.F., 2003, “Human Genetic Diversity: Lewontin’s Fallacy”, BioEssays,
25(8): 798–801. doi:10.1002/bies.10315
• Ehret, Charles F. and Gérard De Haller, 1963, “Origin, Development, and Maturation of
Organelles and Organelle Systems of the Cell Surface in Paramecium”, Journal of
Ultrastructure Research, 9(Suppl 6): 1–42. doi:10.1016/S0022-5320(63)80088-X
• Eichler, E.E., J. Flint, G. Gibson, A. Kong, S.M. Leal, J.H. Moore, and J.H. Nadeau,
2010, “Missing Heritability and Strategies for Finding the Underlying Causes of Complex
Disease”, Nature Reviews Genetics, 11(6): 446–450. doi:10.1038/nrg2809
• Eisen, Jonathan A., 2012, “Badomics Words and the Power and Peril of the ome-
meme”, GigaScience, 1(1): 6. doi:10.1186/2047-217X-1-6
• Ekkers, David Matthias, Mariana Silvia Cretoiu, Anna Maria Kielak, and Jan Dirk van
Elsas, 2012, “The Great Screen Anomaly—A New Frontier in Product Discovery Through
Functional Metagenomics”, Applied Microbiology and Biotechnology, 93(3): 1005–1020.
doi:10.1007/s00253-011-3804-3
• Elliott, Kevin C., 2007, “Varieties of Exploratory Experimentation in Nanotoxicology”,
History and Philosophy of the Life Sciences, 29(3): 313–336.
• Elliott, Tyler A., Stefan Linquist, and T. Ryan Gregory, 2014, “Conceptual and
Empirical Challenges of Ascribing Functions to Transposable Elements”, The American
Naturalist, 184(1): 14–24. doi:10.1086/676588
• ENCODE Project Consortium, 2004, “The ENCODE (ENCyclopedia of DNA
elements) Project”, Science, 306(5696): 636–640. doi:10.1126/science.1105136
• –––, 2007, “Identification and Analysis of Functional Elements in 1% of the Human
Genome by the ENCODE Pilot Project”, Nature, 447(7146): 799–816. doi:10.1038/nature05874
• –––, 2012, “An Integrated Encyclopedia of DNA Elements in the Human Genome”,
Nature, 489(7414): 57–74. doi:10.1038/nature11247
• Ereshefsky, Marc and Makmiller Pedroso, 2013, “Biological Individuality: the Case of
Biofilms”, Biology & Philosophy, 28(2): 331–349. doi:10.1007/s10539-012-9340-4
• Feldman, Marcus W. and Richard C. Lewontin, 2008, “Race, Ancestry, and Medicine”,
in Koenig 2008: 89–101.
• Fields, Stanley and Mark Johnston, 2002, “A Crisis in Postgenomic Nomenclature”,
Science, 296(5568): 671–672. doi:10.1126/science.1070208
• Fleischmann, R.D., et al., 1995, “Whole-genome Random Sequencing and Assembly of
Haemophilus influenzae Rd.” Science, 269(5223): 496–512. doi:10.1126/science.7542800
• Foster, Morris W. and Richard R. Sharp, 2002, “Race, Ethnicity, and Genomics: Social
Classifications as Proxies of Biological Heterogeneity”, Genome Research, 12(6): 844–850.
doi:10.1101/gr.99202
• Franklin, L.R., 2005, “Exploratory Experiments”, Philosophy of Science, 72(5): 888–
899. doi:10.1086/508117
• Fujimura, Joan H., D.A. Bolnick, R. Rajagopalan, J.S. Kaufman, R.C. Lewontin, T.
Duster, P. Ossorio, and J. Marks, 2014, “Clines Without Classes How to Make Sense of Human
Variation”, Sociological Theory, 32(3): 208–227. doi:10.1177/0735275114551611
• Gabor, Esther, Klaus Liebeton, Frank Niehaus, Juergen Eck, and Patrick Lorenz, 2007,
“Updating the Metagenomics Toolbox”, Biotechnology Journal, 2(2): 201–206.
doi:10.1002/biot.200600250
• Gannett, Lisa, 2001, “Racism and Human Genome Diversity Research: the Ethical
Limits of ‘Population Thinking’”, Philosophy of Science, 68(3): S479–S492.
doi:10.1086/392930

26
• –––, 2004, “The Biological Reification of Race”, The British Journal for the Philosophy
of Science, 55(2): 323–345. doi:10.1093/bjps/55.2.323
• –––, 2010, “Questions Asked and Unasked: How by Worrying Less About the ‘Really
Real’Philosophers of Science Might Better Contribute to Debates About Genetics and Race”,
Synthese, 177(3): 363–385. doi:10.1007/s11229-010-9788-1
• Germain, Pierre-Luc, Emanuela Ratti, and Frederico Boem, 2014, “Junk or Functional
DNA? ENCODE and the Function Controversy”, Biology & Philosophy, 29(6): 807–831.
doi:10.1007/s10539-014-9441-3
• Gilbert, Walter, 1992, “A Vision of the Grail”, in The Code of Codes: Scientific and
Social Issues in the Human Genome Project, Daniel Kevles, and Leroy Hood (eds.), Cambridge,
MA: Harvard University Press, pp. 83–97.
• Godfrey-Smith, Peter, 1994, “A Modern History Theory of Functions”, Noûs, 28(3):
344–362. doi:10.2307/2216063
• Goffeau, A., et al., 1996, “Life with 6000 Genes”, Science, 274(5287): 546–567.
doi:10.1126/science.274.5287.546
• Goodman, Alan H., 2000, “Why Genes Don’t Count (For Racial Differences in
Health)”, American Journal of Public Health, 90(11): 1699. pmcid:PMC1446406
• Graur, Dan, 2013, “The Origin of the Term ‘Junk DNA’: A Historical Whodunnit”,
Judge Starling (blog), October 19, 2013, Graur 2013 available online>
• Graur, D., Y. Zheng, N. Price, R.B. Azevedo, R.A. Zufall, and E. Elhaik, 2013, “On the
Immortality of Television Sets: ‘Function’ in the Human Genome According to the Evolution-
free Gospel of ENCODE”, Genome Biology and Evolution, 5(3): 578–590.
doi:10.1093/gbe/evt028
• Green, Eric D., Mark S. Guyer, and National Human Genome Research Institute, 2011,
“Charting a Course for Genomic Medicine from Base Pairs to Bedside”, Nature 470(7333):
204–213. doi:10.1038/nature09764
• Gregory, T. Ryan, 2001, “Coincidence, Coevolution, or Causation? DNA Content, Cell
Size, and the C-value Enigma”, Biological Reviews, 76(1): 65–101. pmid:11325054
• –––, 2005, “Synergy Between Sequence and Size in Large-scale Genomics”, Nature
Reviews Genetics, 6(9): 699–708. doi:10.1038/nrg1674
• –––, 2007, “The onion test”, Evolver Zone Genomicron, April 25, 2007, Gregory 2001
available online.
• Griffiths, Paul E., 1992, “Adaptive Explanation and the Concept of a Vestige”, in Trees
of Life. Essays in Philosophy of Biology, Paul E. Griffiths (ed.), Dordrecht: Kluwer Academic
Publishers, pp. 111–131. doi:10.1007/978-94-015-8038-0_5
• –––, 1993, “Functional Analysis and Proper Functions”, British Journal of the
Philosophy of Science, 44(3): 409–422. doi:10.1093/bjps/44.3.409
• –––, 1994, “Cladistic Classification and Functional Explanation”, Philosophy of
Science, 61(2): 206–227. doi:10.1086/289796
• –––, 2006, “Function, Homology, and Character Individuation”, Philosophy of Science,
73(1): 1–25. doi:10.1086/510172
• Griffiths, Paul E. and Karola Stotz, 2006, “Genes in the Postgenomic Era”, Theoretical
Medicine and Bioethics, 27(6): 499–521. doi:10.1007/s11017-006-9020-y
• Guay, Alexandre and Thomas Pradeu (eds.), 2015, Individuals Across the Sciences,
Oxford: Oxford University Press.
• Hahn, Matthew W. and Gregory A. Wray, 2002, “The g-value Paradox”, Evolution and
Development, 4(2): 73–75. doi:10.1046/j.1525-142X.2002.01069.x
• Hall, Stephen S., 2010, “Revolution Postponed”, Scientific American, 303(4): 60–67.
doi:10.1038/scientificamerican1010-60
• Hamilton, Jennifer A., 2008, “Revitalizing Difference in the HapMap: Race and
Contemporary Human Genetic Variation Research”, The Journal of Law, Medicine & Ethics,
36(3): 471–477. doi:10.1111/j.1748-720X.2008.293.x
• Handelsman, Jo, Michelle R. Rondon, Sean F. Brady, Jon Clardy, and Robert M.
Goodman, 1998, “Molecular Biological Access to the Chemistry of Unknown Soil Microbes: A

27
New Frontier for Natural Products”, Chemistry & Biology, 5(10): R245–R249.
doi:10.1016/S1074-5521(98)90108-9
• [HapMap] The International HapMap Consortium, 2003, “The International HapMap
Project”, Nature, 426(6968): 789–796. doi:10.1038/02168
• –––, 2005, “A Haplotype Map of the Human Genome”, Nature, 437(7063): 1299–1320.
doi:10.1038/nature04226
• –––, 2010, “Integrating Common and Rare Genetic Variation in Diverse Human
Populations”, Nature, 467(7311): 52–58. doi:10.1038/nature09298
• Harrow, Jennifer, A. Frankish, J.M. Gonzalez, E. Tapanari, M. Diekhans, F.
Kokocinski, B.L. Aken et al., 2012, “GENCODE: The Reference Human Genome Annotation
for the ENCODE Project”, Genome Research, 22(9): 1760–1774. doi:10.1101/gr.135350.111
• Holm-Hansen, Osmund, 1969, “Algae: Amounts of DNA and Organic Carbon in Single
Cells”, Science, 163(3862): 87–88. doi:10.1126/science.163.3862.87
• International Human Genome Sequencing Consortium, 2001, “Initial Sequencing and
Analysis of the Human Genome”, Nature, 409(6822): 860–921. doi:10.1038/35057062
• –––, 2004, “Finishing the Euchromatic Sequence of the Human Genome”, Nature,
431(7011): 931–945. doi:10.1038/nature03001
• Jones, Peter A., 2012, “Functions of DNA Methylation: Islands, Start Sites, Gene
Bodies and Beyond”, Nature Reviews Genetics, 13(7): 484–492. doi:10.1038/nrg3230
• Jorde, Lynn B. and Stephen P. Wooding, 2004, “Genetic Variation, Classification and
‘Race’”, Nature Genetics, 36: S28–S33. doi:10.1038/ng1435
• Kalinowski, S.T., 2011, “The Computer Program STRUCTURE Does Not Reliably
Identify the Main Genetic Clusters Within Species: Simulations and Implications for Human
Population Structure”, Heredity, 106(4): 625–632. doi:10.1038/hdy.2010.95
• Karaca, Koray, 2013, “The Strong and Weak Senses of Theory-ladenness of
Experimentation: Theory-driven Versus Exploratory Experiments in the History of High-energy
Particle Physics”, Science in Context, 26(1): 93–136. doi:10.1017/S0269889712000300
• Kaufman, Jay S. and Richard S. Cooper, 2001, “Commentary: Considerations for Use
of Racial/Ethnic Classification in Etiologic Research”, American Journal of Epidemiology,
154(4): 291–298. doi:10.1093/aje/154.4.291
• Keller, Evelyn Fox, 2000, The Century of the Gene, Cambridge, MA: Harvard
University Press.
• –––, 2011, “Genes, Genomes, and Genomics”, Biological Theory, 6(2): 132–140.
doi:10.1007/s13752-012-0014-x
• Kellis, Manolis, et al., 2014, “Defining Functional DNA Elements in the Human
Genome”, Proceedings of the National Academy of Sciences, 111(17): 6131–6138.
doi:10.1073/pnas.1318948111
• Kitcher, Philip, 1994, “Who’s Afraid of the Human Genome Project?”, in PSA:
Proceedings of the Biennial Meeting of the Philosophy of Science Association, 2: 313–321.
doi:10.1086/psaprocbienmeetp.1994.2.192941
• –––, 1999, “Race, Ethnicity, Biology, Culture”, in Racism (Key Concepts in Critical
Theory), Leonard Harris (ed.), New York: Humanity Books, pp. 87–120.
• Koenig, Barbara A., Sandra Soo-Jin Lee, and Sharon S. Richardson (eds.), 2008,
Revisiting Race in a Genomic Age, New Brunswick: Rutgers University Press.
• Koshland, Daniel E. Jr, 1989, “Sequences and Consequences of the Human Genome”,
Science, 246(4927): 189. doi:10.1126/science.2799380
• Krieger, Nancy, 2000, “Refiguring ‘Race’: Epidemiology, Racialized Biology, and
Biological Expressions of Race Relations”, International Journal of Health Services, 30(1):
211–216. doi:10.2190/672J-1PPF-K6QT-9N7U
• Kuska, Bob, 1998, “Beer, Bethesda, and Biology: How ‘Genomics’ Came into Being”,
Journal of the National Cancer Institute, 90(2): 93–93. doi:10.1093/jnci/90.2.93
• Lander, Eric S., 1996, “The New Genomics: Global Views of Biology”, Science,
274(5287): 536. doi:10.1126/science.274.5287.536
• Lederberg, Joshua and Alexa T. McCray, 2001, “’Ome Sweet ’Omics—A Genealogical
Treasury of Words”, The Scientist, 15(7): 8.

28
• Ledford, Heidi, 2016, “AstraZeneca Launches Project to Sequence 2 Million
Genomes”, Nature, 532(7600): 427. doi:10.1038/nature.2016.19797
• Leonelli, Sabina, 2014, “What Difference Does Quantity Make? on the Epistemology of
Big Data in Biology”, Big Data & Society, 1(1), p.2053951714534395.
doi:10.1177/2053951714534395
• –––, 2016, Data-Centric Biology: A Philosophical Study, Chicago, IL: Chicago
University Press.
• Leroi, Armand Marie, 2005, “A Family Tree in Every Gene”, New York Times, March
14, A23.
• Levy, Samuel, et al., 2007, “The Diploid Genome Sequence of An Individual Human”,
PLoS Biology, 5(10): e254. doi:10.1371/journal.pbio.0050254
• Lewontin, Richard C., 1992, Biology as Ideology, New York: Harper Collins.
• –––, 2011, “It’s Even Less in Your Genes”, The New York Review of Books, 58(9).
• Li, W.H. and L.A. Sadler, 1991, “Low Nucleotide Diversity in Man”, Genetics, 129(2):
513–523.
• Lindblad-Toh, Kerstin, et al., 2011, “A High-resolution Map of Human Evolutionary
Constraint Using 29 Mammals”, Nature, 478(7370): 476–482. doi:10.1038/nature10530
• Livingstone, Frank B., 1962, “On the Non-Existence of Human Races”, Current
Anthropology, 3(3): 279–281. doi:10.1086/200290
• Lorenz, Patrick and Jürgen Eck, 2005, “Metagenomics and Industrial Applications”,
Nature Reviews Microbiology, 3(6): 510–516. doi:10.1038/nrmicro1161
• Lupski, James R., 1998, “Genomic Disorders: Structural Features of the Genome Can
Lead to DNA Rearrangements and Human Disease Traits”, Trends in Genetics, 14(10): 417–
422. doi:10.1016/S0168-9525(98)01555-8
• –––, 2009, “Genomic Disorders Ten Years On”, Genome Medicine, 1(4): 42.
doi:10.1186/gm42
• Luria, S.E., Dan M. Cooper, and Ari Berkowitz, 1989, “Human Genome Project”,
Science, 246(4932):873–874. doi:10.1126/science.246.4932.873-b
doi:10.1126/science.246.4932.873-d doi:10.1126/science.2814503
• Mardis, Elaine R., 2006, “Anticipating the $1,000 Genome”, Genome Biology, 7(7):
112. doi:10.1186/gb-2006-7-7-112
• –––, 2011, “A Decade’s Perspective on DNA Sequencing Technology”, Nature,
470(7333): 198–203. doi:10.1038/nature09796
• Marks, Jonathan, 2008, “Race: Past, Present, and Future”, in Koenig 2008: 21–38.
• Marx, Vivian, 2013, “Biology: the Big Challenges of Big Data”, Nature, 498(7453):
255–260. doi:10.1038/498255a
• McClellan, Jon and Mary-Claire King, 2010, “Genetic Heterogeneity in Human
Disease”, Cell, 141(2): 210–217. doi:10.1016/j.cell.2010.03.032
• McFall-Ngai, M., et al., 2013, “Animals in a Bacterial World, a New Imperative for the
Life Sciences”, Proceedings of the National Academy of Sciences, 110(9): 3229–3236.
doi:10.1073/pnas.1218525110
• Millikan, Ruth Garrett, 1984, Language, Thought, and Other Biological Categories:
New Foundations for Realism, Cambridge, MA: MIT Press.
• –––, 1989a, “In Defense of Proper Functions”, Philosophy of Science, 56(2): 288–302.
doi:10.1086/289488
• –––, 1989b, “An Ambiguity in the Notion ‘Function’”, Biology and Philosophy, 4(2):
172–176.
• Mirsky, A.E. and Hans Ris, 1951, “The Desoxyribonucleic Acid Content of Animal
Cells and Its Evolutionary Significance”, The Journal of General Physiology, 34(4): 451–462.
doi:10.1085/jgp.34.4.451
• Moss, Lenny, 2003, What Genes Can’t Do, Cambridge, MA: MIT Press.
• –––, 2006, “Redundancy, Plasticity, and Detachment: the Implications of Comparative
Genomics for Evolutionary Thinking”, Philosophy of Science, 73(5): 930–946.
doi:10.1086/518778

29
• Myers, Eugene W., et al., 2000, “A Whole-Genome Assembly of Drosophila”, Science,
287(5461): 2196–2204. doi:10.1126/science.287.5461.2196
• National Research Council (Committee on Metagenomics: Challenges and Functional
Applications), 2007, The New Science of Metagenomics: Revealing the Secrets of Our
Microbial Planet, National Research Council Report 13, Washington DC: National Academies
Press. doi:10.17226/11902
• Neander, Karen, 1991, “Functions as Selected Effects: the Conceptual Analyst’s
Defense”, Philosophy of Science, 58(2): 168–184. doi:10.1086/289610
• Nicholson, D. and John Dupré (eds), forthcoming, Everything Flows: Towards a
Processual Philosophy of Biology, Oxford: Oxford University Press.
• NIH: National Institutes of Health, 2002, “International Consortium Launches Genetic
Variation Mapping Project: HapMap Will Help Identify Genetic Contributions to Common
Diseases”, NIH News Advisory, October 2002, NIH 2002 available online.
• –––, 2004, “NHGRI Seeks Next Generation of Sequencing Technologies New Grants
Support Development of Faster, Cheaper DNA Sequencing”, NIH News Release, October 14,
2004, NIH 2004 available online.
• –––, 2015, “NIH Framework Points the Way Forward for Building National, Large-
scale Research Cohort, a Key Component of the President’S Precision Medicine Initiative”,
NIH News Releases, September 17, 2015. NIH 2015 available online.
• –––, 2016, “What is a genome?”, Genetics Home Reference: Your Guide to
Understanding Genetic Conditions, NIH: U.S. National Library of Medicine, NIH 2016
available online, accessed October 10, 2016.
• Niu, Deng-Ke and Li Jiang, 2013, “Can ENCODE Tell Us How Much Junk DNA We
Carry in Our Genome?” Biochemical and Biophysical Research Communications, 430(4):
1340–1343. doi:10.1016/j.bbrc.2012.12.074
• Ohno, Susumu, 1972, “So Much ‘Junk’ DNA in Our Genome”, in Brookhaven
Symposium on Biology, 23, Routledge, pp. 366–370.
• O’Malley, Maureen A., 2007, “Exploratory Experimentation and Scientific Practice:
Metagenomics and the Proteorhodopsin Case”, History and Philosophy of the Life Sciences,
29(3): 337–358.
• –––, 2013, “Metagenomics”, in Encyclopedia of Systems Biology, W. Dubitzky, O.
Wolkenhauer, H. Yokota, and K.-H. Cho (eds.), Springer, p. 1283.
• O’Malley, Maureen A. and John Dupré, 2005, “Fundamental Issues in Systems
Biology”, BioEssays, 27(12): 1270–1276. doi:10.1002/bies.20323
• O’Malley, Maureen A. and Orkun S. Soyer, 2012, “The Roles of Integration in
Molecular Systems Biology”, Studies in History and Philosophy of Science Part C: Studies in
History and Philosophy of Biological and Biomedical Sciences, 43(1): 58–68.
doi:10.1016/j.shpsc.2011.10.006
• O’Malley, Maureen A., Kevein C. Elliott, and Richard M. Burian, 2010, “From Genetic
to Genomic Regulation: Iterativity in MicroRNA Research”, Studies in History and Philosophy
of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences,
41(4): 407–417. doi:10.1016/j.shpsc.2010.10.011
• Orgel, L.E. and F.H. Crick, 1980, “Selfish DNA: the Ultimate Parasite”, Nature,
284(5757): 604–607. doi:10.1038/284604a0
• Ossorio, Pilar N., 2005, “Race, Genetic Variation, and the Haplotype Mapping Project”,
Louisiana Law Review, 66(5): 131–143.
• Palca, Joseph, 1986, “The Numbers Game”, Nature, 321(6068): 371.
doi:10.1038/321371b0
• Pennisi, Elizabeth, 2012, “ENCODE Project Writes Eulogy for Junk DNA”, Science,
337(6099): 1159–1161. doi:10.1126/science.337.6099.1159
• Perini, Laura, 2011, “Sequence Matters: Genomic Research and the Gene Concept”,
Philosophy of Science, 78(5): 752–762. doi:10.1086/662565
• Phillips, Kathryn A., Mark J. Pletcher, and Uri Ladabaum, 2015, “Is the ‘$1000
Genome’ Really $1000? Understanding the Full Benefits and Costs of Genomic Sequencing”,

30
Technology and Health Care: Official Journal of the European Society for Engineering and
Medicine, 23(3): 373–379.
• Pigliucci, Massimo and Jonathan Kaplan, 2003, “On the Concept of Biological Race
and Its Applicability to Humans”, Philosophy of Science, 70(5): 1161–1172. doi:10.3233/THC-
150900
• Piotrowska, Monika, 2009, “What Does it Mean to be 75% Pumpkin? The Units of
Comparative Genomics”, Philosophy of Science, 76(5): 838–850. doi:10.1086/605813
• Qu, Hongzhu and Xiangdong Fang, 2013, “A Brief Review on the Human
Encyclopedia of DNA Elements (ENCODE) Project”, Genomics, Proteomics & Bioinformatics,
11(3): 135–141. doi:10.1016/j.gpb.2013.05.001
• Reardon, Sara, 2015, “US Precision-medicine Proposal Sparks Questions”, Nature,
517(7536): 540. doi:10.1038/nature.2015.16774
• Rechsteiner, Martin C., 1991, “The Human Genome Project: Misguided Science
Policy”, Trends in Biochemical Sciences, 16: 455–461. doi:10.1016/0968-0004(91)90178-X
• Richardson, Sarah S. and Hallam Stevens (eds.), 2015, Postgenomics: Perspectives on
Biology After the Genome, Chapel Hill, NC: Duke University Press.
• Risch, Neil, Esteban Burchard, Elad Ziv, and Hua Tang, 2002, “Categorization of
Humans in Biomedical Research: Genes, Race and Disease”, Genome Biology, 3(7): 1–12.
• Roberts, Dorothy, 2011, Fatal Invention: How Science, Politics, and Big Business Re-
create Race in the Twenty-first Century, New York: The New Press.
• Rosenberg, Alex, 1994, “Subversive Reflections on the Human Genome Project”, in
PSA: Proceedings of the Biennial Meeting of the Philosophy of Science Association, 2: 329–
335. doi:10.1086/psaprocbienmeetp.1994.2.192943
• Rosenberg, Noah A., J.K. Pritchard, J.L. Weber, H.M. Cann, K.K. Kidd, L.A.
Zhivotovsky, and M.W. Feldman, 2002, “Genetic Structure of Human Populations”, Science,
298(5602): 2381–2385. doi:10.1126/science.1078311
• Rothfels, Klaus, Elizabeth Sexsmith, Margaret Heimburger, and Margarida O. Krause,
1966, “Chromosome Size and DNA Content of Species of Anemone L. and Related Genera
(Ranunculaceae)”, Chromosoma, 20(1): 54–74. doi:10.1007/BF00331898
• Sanger, F. et al., 1977, “Nucleotide Sequence of Bacteriophage ΦX174 DNA”, Nature,
265(5596): 687–695. doi:10.1038/265687a0
• Sboner, A., X.J. Mu, D. Greenbaum, R.K. Auerbach, and M.B. Gerstein, 2011, “The
Real Cost of Sequencing: Higher Than You Think”, Genome Biology, 12(8): 125. doi:
10.1186/gb-2011-12-8-125.
• Scholz, Matthew B., Chien-Chi Lo, and Patrick SG Chain, 2012, “Next Generation
Sequencing and Bioinformatic Bottlenecks: The Current State of Metagenomic Data Analysis”,
Current Opinion in Biotechnology, 23(1): 9–15. doi:10.1016/j.copbio.2011.11.013
• Schwartz, Robert S., 2001, “Racial Profiling in Medical Research”, New England
Journal of Medicine, 344(18): 1392–1393. doi: 10.1056/NEJM200105033441810
• Serre, David and Svante Pääbo, 2004, “Evidence for Gradients of Human Genetic
Diversity Within and Among Continents”, Genome Research, 14(9): 1679–1685.
doi:10.1101/gr.2529604
• Shiao, Jiannbin Lee, Thomas Bode, Amber Beyer, and Daniel Selvig, 2012, “The
Genomic Challenge to the Social Construction of Race”, Sociological Theory, 30(2): 67–88.
doi:10.1177/0735275112448053
• Sinsheimer, Robert L., 1989, “The Santa Cruz Workshop—May 1985”, Genomics,
5(4): 954–956. doi:10.1016/0888-7543(89)90142-0
• Sleator, Roy D., C. Shortall, and C. Hill, 2008, “Metagenomics”, Letters in Applied
Microbiology, 47(5): 361–366. doi:10.1111/j.1472-765X.2008.02444.x
• Staley, James T. and Allan Konopka, 1985, “Measurement of in Situ Activities of
Nonphotosynthetic Microorganisms in Aquatic and Terrestrial Habitats”, Annual Reviews in
Microbiology, 39(1): 321–346. doi:10.1146/annurev.mi.39.100185.001541
• Steinle, Friedrich, 1997, “Entering New Fields: Exploratory Uses of Experimentation”,
Philosophy of Science, 64(Proceedings): S65–S74. doi:10.1086/392587

31
• Stolley, Paul D., 1999, “Race in Epidemiology”, International Journal of Health
Services, 29(4): 905–909. doi:10.2190/QAAH-P5DT-WMP8-8HNL
• Stotz, Karola C., A. Bostanci, and Paul E. Griffiths, 2006, “Tracking the Shift to
‘Postgenomics’”, Public Health Genomics, 9(3): 190–196. doi:10.1159/000092656
• Streit, Wolfgan R. and Ruth A. Schmitz, 2004, “Metagenomics—The Key to the
Uncultured Microbes”, Current Opinion in Microbiology, 7(5): 492–498.
doi:10.1016/j.mib.2004.08.002
• Tang, Hua, et al., 2005, “Genetic Structure, Self-identified Race/Ethnicity, and
Confounding in Case-Control Association Studies”, The American Journal of Human Genetics,
76(2): 268–275. doi:10.1086/427888
• Tauber, Alfred I. and Sahotra Sarkar, 1992, “The Human Genome Project: Has Blind
Reductionism Gone Too Far?” Perspectives in Biology and Medicine, 35(2): 220–235.
doi:10.1353/pbm.1992.0015
• Thomas, C.A. Jr., 1971, “The Genetic Organization of Chromosomes”, Annual Review
of Genetics, 5(1): 237–256. doi:10.1146/annurev.ge.05.120171.001321
• Touchman, Jeffrey, 2010, “Comparative Genomics”, Nature Education Knowledge,
3(10): 13.
• Turnbaugh, Peter J. and Jeffrey I. Gordon, 2008, “An Invitation to the Marriage of
Metagenomics and Metabolomics”, Cell, 134(5): 708–713. doi:10.1016/j.cell.2008.08.025
• Turnbaugh, Peter J., et al., 2009, “A Core Gut Microbiome in Obese and Lean Twins”,
Nature, 457(7228): 480–484. doi:10.1038/nature07540
• Tyson, G.W., J. Chapman, P. Hugenholtz, E.E. Allen, R.J. Ram, P.M. Richardson, V.V.
Solovyev, E.M. Rubin, D.S. Rokhsar, and J.F. Banfield, 2004, “Community Structure and
Metabolism Through Reconstruction of Microbial Genomes from the Environment”, Nature,
428(6978): 37–43.
• Varmus, Harold, 2010, “Ten Years On—The Human Genome and Medicine”, New
England Journal of Medicine, 362(21): 2028–2029. doi:10.1056/NEJMe0911933
• Venter, J. Craig, 2000, “Remarks at the Human Genome Announcement”, Functional &
Integrative Genomics, 1(3): 154–155. doi: 10.1007/s101420000026
• –––, 2010, “Multiple Personal Genomes Await”, Nature, 464(7289): 676–677.
doi:10.1038/464676a
• Venter, J. Craig, et al., 2001, “The Sequence of the Human Genome”, Science,
291(5507): 1304–1351. doi:10.1126/science.1058040
• Venter, J. Craig, et al., 2004, “Environmental Genome Shotgun Sequencing of the
Sargasso Sea”, Science, 304(5667): 66–74. doi:10.1126/science.1093857
• Visscher, Peter M., Matthew A. Brown, Mark I. McCarthy, and Jian Yang, 2012, “Five
Years of GWAS Discovery”, The American Journal of Human Genetics, 90(1): 7–24.
doi:10.1016/j.ajhg.2011.11.029
• Wade, Nicholas, 2010, “A Decade Later, Genetic Map Yields Few New Cures”, New
York Times, June 13, 2010, page 1.
• Wang, David G., et al., 1998, “Large-Scale Identification, Mapping, and Genotyping of
Single-Nucleotide Polymorphisms in the Human Genome”, Science, 280(5366): 1077–1082.
doi:10.1126/science.280.5366.1077
• Ward, Lucas D. and Manolis Kellis, 2012, “Evidence of Abundant Purifying Selection
in Humans for Recently Acquired Regulatory Functions”, Science, 337(6102): 1675–1678.
doi:10.1126/science.1225057
• Waters, C. Kenneth, 2007a, “The Nature and Context of Exploratory Experimentation:
An Introduction to Three Case Studies of Exploratory Research”, History and Philosophy of the
Life Sciences, 29(3): 275–284.
• –––, 2007b, “Causes that Make a Difference”, The Journal of Philosophy, 104(11):
551–579. doi:10.5840/jphil2007104111
• Weinberg, Robert A., 1991, “The Human Genome Initiative. There Are Two Large
Questions”, The FASEB Journal, 5(1): 78.
• –––, 2010, “Point: Hypotheses First”, Nature, 464(7289): 678–678.
doi:10.1038/464678a

32
• Wheeler, David A., et al., 2008, “The Complete Genome of An Individual by Massively
Parallel DNA Sequencing”, Nature, 452(7189): 872–876. doi:10.1038/nature06884
• White House, 2000, “Remarks Made by the President, Prime Minister Tony Blair of
England (via satellite), Dr. Francis Collins, Director of the National Human Genome Research
Institute, and Dr. Craig Venter, President and Chief Scientific Officer, Celera Genomics
Corporation, on the Completion of the First Survey of the Entire Human Genome Project”, June
26, White House 2000 available online.
• Whitfield, T.W., J. Wang, P.J. Collins, E.C. Partridge, S.F. Aldred, N.D. Trinklein,
R.M. Myers, and Z. Weng, 2012, “Functional Analysis of Transcription Factor Binding Sites in
Human Promoters”, Genome Biology, 13(9): R50, doi:10.1186/gb-2012-13-9-r50.
• Winkler, Hans, 1920, Verbreitung und Ursache der Parthenogenesis im Pflanzen- und
Tierreiche, Jena: Fischer Verlag.
• Witzig, Ritchie, 1996, “The Medicalization of Race: Scientific Legitimization of a
Flawed Social Construct”, Annals of Internal Medicine, 125(8): 675–679. doi:10.7326/0003-
4819-125-8-199610150-00008
• Wood, Alastair J., 2001, “Racial Differences in the Response to Drugs—Pointers to
Genetic Differences”, New England Journal of Medicine, 344(18): 1394–1396.
doi:10.1056/NEJM200105033441811
• Wright, Larry, 1976, Teleological Explanation: An Etiological Analysis of Goals and
Functions, Berkeley: University of California Press.
• Wright, Mathew W. and Elspeth A. Bruford, 2011, “Naming ‘Junk’: Human Non-
protein Coding RNA (NcRNA) Gene Nomenclature”, Human Genomics, 5(2): 90–98.
doi:10.1186/1479-7364-5-2-90
• Xie, Hong-Guang, Richard B. Kim, Alastair J.J. Wood, and C. Michael Stein, 2001,
“Molecular Basis of Ethnic Differences in Drug Disposition and Response”, Annual Review of
Pharmacology and Toxicology, 41(1): 815–850. doi:10.1146/annurev.pharmtox.41.1.815
• Yadav, Satya P., 2007, “The Wholeness in Suffix -omics, -omes, and the Word Om”,
Journal of Biomolecular Techniques, 18(5): 277. pmcid:PMC2392988
• Yudell, Michael, 2011, “A Short History of the Race Concept”, in Race and the Genetic
Revolution: Science, Myth, and Culture, Sheldon Krimsky and Kathleen Sloan (eds.), New
York: Columbia University Press, pp. 13–30.

Copyright © 2016 by
Stephan Guttinger
John Dupré

33

You might also like