HoytEtAl 2018 BioInformatics

18
Bioinformatics
ROBERT E. HOYT • WILLIAM R. HERSH • INDRA NEIL SARKAR
LEARNING OBJECTIVES
After reading this chapter the reader should be able to:
• Define bioinformatics, translational bioinformatics, • List major private and governmental bioinformatics
and other bioinformatics-related terms databases and projects
• State the importance of bioinformatics in future • Enumerate several bioinformatics projects that involve
medical treatments and prevention electronic health records
• Describe genomics and its important implications for • Describe the application of bioinformatics in genetic
health care profiling of individuals and large populations
INTRODUCTION
relationships.3-4 A related term is computational biology,
This chapter is focused on “bioinformatics,” the study which refers to the computational aspects of molec-
of data and information as it relates to knowledge within ular biology. Translational bioinformatics focuses on
the context of the life sciences. Bioinformatics traces the “development of storage, analytic and interpretive
its formal beginning to 1970, when the term was first methods to optimize the transformation of increasingly
introduced in scientific literature.1 In many ways, bioin- voluminous biomedical data into proactive, predic-
formatics has evolved in parallel with health informatics. tive, preventive and participatory health.”5 Simply put,
Significant advances in bioinformatics have given rise translational bioinformatics is the specialization of bioin-
to contemplation of its applications within the context of formatics for human health.
biomedicine and health (the sub-discipline of biomedical Bioinformatics is sometimes said to work with the
informatics referred to as “translational bioinformatics.”) various “omes” and “omics.” They include:
• Genomics - the study of genetic material in an
Definitions organism (e.g., the genes that may be associated
with a disease).
The chapter begins with common definitions and the • Proteomics - the study at the level of proteins (e.g.,
next section provides a short primer on genomics, which through the components, structure, and functions).
underpins many of the concepts used for bioinformatics • Pharmacogenomics - the study of genetic material
within the context of health. in relationship to drug targets.
Bioinformatics has been defined as, “the field of • Metabolomics - the study of genes, proteins or metab-
science in which biology, computer science and infor- olites.
mation technology merge to form a single discipline.”2 • Interactome - biomolecular pathways and interac-
Bioinformatics makes use of fundamental aspects tions of proteins
of computer science (such as databases and artificial • Microbiome - microorganisms inhabiting an indi-
intelligence) to develop algorithms for facilitating the vidual
development and testing of biological hypotheses, such • Exposome – environmental factors to which an
as: finding the genes of various organisms, predicting organism is exposed
the structure or function of newly developed proteins, • Bibliome – the literature of science
developing protein models and examining evolutionary
357
358 Chapter 18: Bioinformatics
• Metagenomics - the analysis of genetic material polymorphisms (SNPs) (pronounced “snips”). There
derived from complete microbial communities har- are three general types of alterations: single base-pair
vested from natural environments.6 changes, insertions or deletions of nucleotides, and
In addition, bioinformatics studies the relationship reshuffled DNA sequences. As an example, one indi-
between the genotype, which is the genetic information vidual might have a chromosome with the sequence
that is associated with biological function7 and the pheno- TGGC, while another might have the sequence TAGC.
type, which is the observable characteristic, structure, Each of these is referred to as an allele. Although SNPs
function and behavior of a living organism. Examples of are common, their significance is complex and difficult
phenotypes include hair color, height, and development to decipher.8-10
of diseases. The phenome refers to the total phenotypic Another type of genetic variation is copy number
traits. variations (CNVs). These are repeats of DNA sequences
of 50 nucleotides or longer. There may be many CNVs in
an individual’s genome. Anywhere from 4.8% to 9.5%
GENOMIC PRIMER of the human genome is CNVs. Some of these copies of
genomes are deleterious, others are not. When they are
The human body has about 100 trillion cells and each
deleterious, they are so-called unbalanced rearrange-
one contains a complete set of genetic information (chro-
ments, involving either loss or gain of segments of the
mosomes) in the nucleus; exceptions are eggs, sperm, and
genome.11
red blood cells. Humans have a pair of 23 chromosomes
An additional cause of genetic variation is epigenetics,
in each cell that includes an X and Y chromosome for
which is the variation in the phenotype or gene expression
males and two Xs for females. Offspring inherit one
that is caused by mechanisms other than DNA sequence
pair from each parent. Chromosomes are listed approxi-
differences.12 The molecular mechanisms for epigenetics
mately by size with chromosome 1 being the largest and
are beginning to be unraveled, such as DNA methyla-
chromosome 22 the smallest. Organisms have differing
tion.13 Epigenetics shows that there is influence of the
numbers of chromosomes (e.g., our closest extant primate
environment on the expression of genes, and therefore
relatives, chimpanzees, have 24 pairs). Chromosomes
leading to genetic variation.
consist of double twisted helices of deoxyribonucleic
A great deal of progress has been made with genetic
acid (DNA). DNA is composed of four sugar-based
testing and our understanding of the human genome and
building blocks (“nucleotides”: adenine [A], thymine [T],
genetic variations. Genome-wide associations studies
cytosine [C], and guanine [G]) that are generally found
(GWAS) look at associations between genomic variants
in pairs (following “Watson-Crick” pairing templates:
and traits of the phenotype.14 The variations or SNPs
A-T, C-G). DNA is often referred to as the “blueprint for
discovered are said to be associated with the disease, but
life.” As such, a given organism’s DNA encodes its full
true cause and effect cannot be ascertained.15 Similarly,
complement of proteins essential for cellular function.
phenome-wide association studies (PheWAS) are being
Some of the encoding of DNA also enables it to control
carried out comparing genes to disease associations, most
the expression of proteins or affect how other portions
recently using the electronic health record for phenotyp-
of DNA may be decoded based on a biological context
ical information.16
(e.g., to accommodate for faulty DNA decoding or DNA
Genetic material can be obtained from blood, saliva,
damage that may be encountered due to environmental
skin and hair samples. Full genome sequencing has
phenomena). Genes are regions on chromosomes that
historically been an expensive, time-consuming, and
encode instructions, which may result in proteins that
complicated process, although the cost of full genome
then in turn enable biological functions. The process
sequencing has dropped to approximately $1000 (US).
of decoding genes involves transcribing the DNA into
SNP-based genomic profiling is available now for less
ribonucleic acid (RNA) and then translation into amino
than $200 US (e.g., 23andMe). This cost differential is
acids that form the building blocks for proteins (Figure
largely because SNP genotyping analyzes about 0.1 –
18.1). Collectively, the complete set of genes is referred
0.2% of the genome in contrast to every single nucleotide.
to as the “genome” (based on the combination of the
Even more cost effective is ultra-low-coverage (ULC)
terms “gene” and “chromosome”).
sequencing techniques that analyze the same 10-20% of
It is estimated that humans have between 20,000
the genome and cost $60 US. SNP-based genotyping has
and 30,000 genes and that genomes are about 99.9%
more specificity since it uses a technique the seeks to
similar between individuals. Variations in genomes
identify SNPs from a library of a priori selected SNPs
between individuals are known as single nucleotide
of interest (using a technology called “microarrays”);
Chapter 18: Bioinformatics 359
by contrast ULC techniques may be more sensitive but IMPORTANCE OF TRANSLATIONAL

may be difficult to ensure reproducibility. Therefore, BIOINFORMATICS
ULC techniques may serve the role of SNP discovery,
whereas SNP-based genotyping can serve the role of Besides diagnosing the 3,000 to 4,000 hereditary
SNP identification. diseases that are currently known, bioinformatics tech-
niques may be helpful to discover future drugs targets,
develop personalized drugs based on genetic profiles and
develop gene therapies to treat diseases with a strong
genomic component, such as cancer. One approach that
has been explored to enable gene therapies involves the
use of genetically altered viruses that carry human DNA.
This approach, however, has not been definitely shown
to work and has not been for general use by the FDA.
Recent years has seen significant advances in genome
editing using a technology called CRISPR/Cas9 gene
editing. However, while manipulation of genomes in
other organisms, such as microbes, has shown promise
for energy production (“bio-fuels”), environmental
cleanup, industrial processing, and waste reduction,
clinical applications are still nascent in their design
and approval. Nonetheless, the advent of technologies,
such as CRISPR/Cas9, suggest that genome editing will
become a viable path for treatment of genetically based
conditions within the next decade.18
This chapter will deal primarily with translational
bioinformatics (TBI), the identified area of focus in
bioinformatics that is focused primarily on the use of
Figure 18.1: Genes (Courtesy of National Institute of bioinformatics approaches to address challenges in
General Medical Sciences, National Institutes of Health) biomedicine and health. A significant goal of TBI is to
enable bi-directional crossing of the translational barrier
Often, we do not want to know just the genome between the research bench and the bed in the medical
sequence but the amount that the genes are actually clinic. With growing genome-wide and population-based
expressed.17 Some call the genome itself a “parts list,” research data sets, more genotype-phenotype associ-
whereas what we want to know is how much those parts ations are being uncovered that potentially can detect
are used. One of the early techniques for measuring gene and treat diseases with a genetic component earlier.
expression was the gene microarray. These are chips Such associations may also help create tailor made drugs
that allow measurement of many different biological for higher efficacy. Figure 18.2 demonstrates the bidi-
substances, not only DNA and RNA, but also protein. rectional nature of data and information flow between
When nucleotides are measured there are 25-base strands. bioinformatics and health informatics. The emergence
Since most genes are longer than 25 bases, there are more
than one spot per gene. And in fact, a typical microarray
chip has up to 1 to 2 million of these spots, so many
different genes or transcriptions of RNA of those genes
can be measured on microarrays. The technology has
now become very easy to mass produce. The initial
application of microarrays was to measure gene expres-
sion, i.e., the mRNA that had been transcribed. But other
applications of microarrays have emerged, including the
sequencing of unknown DNA. Microarrays are being
supplanted by a newer technology called RNA-seq, which
Figure 18.2: Translational bioinformatics (Adapted
measures mRNA more accurately.
from Sarkar et al)19
of translational bioinformatics is primarily due to the so cancer cells evolve and attain this ability to proliferate
rapid advances in both sub-disciplines, and the realiza- outside the normal defense mechanisms of organisms.24
tion of the potential to leverage biological data within Another important discovery is that some genomic
the context of clinical care. In other words, a variety of changes representing common tumor types occur in
advances in bioinformatics, such as faster and cheaper different cancer locations, so that the same mechanism
DNA sequencing, and more widespread adoption of may occur in more than one type of cancer.25-26 There
electronic health records have made this possible. has also been improved understanding of gene function,
Pharmacogenomics is an illustrative example of how and of course this may aid in the development of better
translational bioinformatics can be used within the treatments.27
context of pharmaceutical development to make use of
genomic information for better drug discovery and utili-
zation. Drug companies are faced with the huge expense
BIOINFORMATICS PROJECTS AND CENTERS
of drug development, the long road to producing a new
drug and expiring patents. Drug failures are common
The Human Genome Project
and can be due to complex combination of a lack of One of the greatest accomplishments in biomedicine
clinical efficacy, side effects and commercial issues. was the completion of the Human Genome Project (HGP).
Unfortunately, animal models are often inadequate for This international collaborative project, sponsored by the
the development and evaluation of drugs for treating US Department of Energy and the National Institutes
human conditions. It is thus the goal to use genetic infor- of Health, was started in 1990 and finished in 2003.
mation for: In the process of acquiring the human genome (as a
• New indications for an old drug (repurposing) complete set of DNA sequences, encompassing all 23
• New targets for existing drugs (e.g., treatment of chromosomes), genome sequences for several other key
tongue cancer using RET inhibitors) organisms (“model” organisms) were also acquired.
• Drugs to work better in certain patient groups These included the Escherichia coli bacterium, fruit
(gender, age, race, ethnicity, etc.) with possible fly (Drosophila melanogaster), and house mouse (Mus
genetic variants musculus). By mid-2007 about three million differ-
• Knowing ahead of time what drugs to avoid due to ences (SNPs) had been identified in human genomes.
higher incidence of side effects that are genetically Appreciating the potential significant societal impact, the
modulated HGP also addressed the ethical, legal and social issues
• Develop clinical decision support in electronic health associated with the project. Since the completion of the
records based on pharmacogenomics 20-21 HGP, attention is now more focused on the development
Multiple projects are underway to integrate genetic of approaches to analyze and learn from volumes of data
and clinical data that will be discussed later in the representing increasing numbers of individuals.11,28-29
chapter. Electronic health records (EHRs) and health These analyses include the annotation of information
information exchange (HIE) efforts, which are rapidly associated with disease onto chromosomes. Figure
becoming ubiquitous, thanks in large part to federal 18.3 displays the DNA sequencing of just chromosome
incentives, are poised to contribute massive amounts of number 12. Huge relational databases are necessary to
patient information (including demographic, laboratory, store and retrieve this information. New technologies
and clinical data). It is important to also note that in continue to emerge that reduce the necessity to sequence
addition to genomic and clinical data, environmental an entire human genome, such as DNA arrays (gene
data may offer valuable insights into the understanding chips) that help speed the analysis and comparison of
and eventual treatment of disease. DNA fragments.30 The cost of the HGP was close to $3
Another important application of translational bioin- billion; but over time, costs have dramatically dropped
formatics is in cancer genomics. There have been for genetic analysis.7
so-called hallmarks papers that describe the state of the
knowledge on how cancer cells proliferate and evade National Human Genome Research Institute (NHGRI)
death by a number of mechanisms within living organ-
NHGRI is an NIH institute that provides many
isms.22 When whole genome sequencing is applied to
educational resources on their web site. Like other
tumor cells, it shows that they undergo genomic changes
NIH institutes, they conduct and fund research within
that give them the ability to proliferate and metastasize
their intramural division, as well as support extramural
within living organisms.23 It has also been discovered that
research with external partners. Their health section has
cancer gene mutations, within tumors, are heterogeneous,
Figure 18.3: Chromosome 12 (Courtesy of the National Library of Medicine)
multiple resources for patients and healthcare profes- initiative that catalogued the myriad of organisms
sionals with emphasis on the Human Genome Project. that co-exist with humans and heretofore have been
The “Issues in Genetics” section covers important rarely studied (e.g., flora from oral, nasal, skin, and
controversies in policy, legal and ethical issues in genetic the gastrointestinal tract). It is important to note that
research. They include a large glossary (200+) of genet- microbial cells on the human body outnumber human
ics-related definitions, also available as a software app cells by a factor of 10 to 1. Initial efforts were aimed
for the iPhone and iPad. at identifying the microbiome in health patients. More
In 2003, NHGRI launched the Encyclopedia of DNA recently, extensive work has been done to identify the
Elements (ENCODE) Project. ENCODE is comprised microbiome in multiple disease states with results too
of a consortium of laboratories with the goal to study comprehensive to cite in this chapter. Three areas the
and characterize the functional elements of the human HMP is currently focusing on include pregnancy and
genome. All ENCODE data are free for research pre-term birth, onset of inflammatory bowel disease and
purposes. In 2012, 1640 data sets were published, which onset of type 2 diabetes.
continue to produce controversy. For example, ENCODE The HMP used metagenomics, as explained in the
researchers posited that 80% of the human genome is definitions section. As detailed on the HMP web site
active and performing a role (and thus not “junk” DNA their goals were as follows:
as has been previously thought).31 • Determine whether individuals share a core human
microbiome
Human Microbiome Project (HMP) • Understand whether changes in the human micro-
biome can be correlated with changes in human
It is estimated that less than 0.01% of microbes on Earth
health
have been cultured, characterized, and sequenced. As an
• Develop new technological and bioinformatics tools
exception, the complete genome for the common human
needed to support these goals
parasite Trichomonas vaginalis was reported in 2007 in
• Address the ethical, legal and social implications
the journal Science.32 The HMP is an NIH-sponsored
raised by human microbiome research6
Human Variome Project Framingham Heart Study SHARe Genome-Wide

This is an international non-governmental organiza-
Association Study
tion that began in 2006 with the goal to create systems In 2007, the Framingham Heart Study began a new
and standards for storage, transmission and use of genetic phase by genotyping 17,000+ subjects as part of the FHS
variations to improve health. Rather than catalogue SHARe (SNP Health Association Resource) project. The
“normal” genomes they focus on the abnormalities that SHARe database is located at NCBI’s dbGaP and will
cause disease. Another aspect of their vision is to provide contain 550,000 SNPs and a vast array of phenotypical
free public access to their databases.33 (combined characteristics of the genome and environ-
ment) information available in all three generations of
The PhenX Project FHS subjects. These will include measures of the major
risk factors such as systolic blood pressure, total, LDL
The goal of this project is to identify 15 high-quality,
and HDL cholesterol, fasting glucose, and cigarette use,
well established measures and standards for each of
as well as anthropomorphic measures such as body mass
21 research domains. Standardization is important so
index, biomarkers such as fibrinogen and C-reactive
that phenotypical, risk factors and environmental expo-
protein (CRP) and electrocardiography (EKG) measures
sures can be compared. For example, if everyone used
such as the QT interval. Because of this initiative, they
a common set of standards data could be more readily
have been able to publish multiple articles on genetic
compared or combined to gain more statistical power.34-35
associations and heart disease.40
1000 Genomes Project The Mayo Clinic Bipolar Disorder Biobank
This is an international initiative with the goal to
Researchers at the Mayo clinic and other institutions
catalogue and study the genomes of 2500 individuals
are analyzing the genetic and clinical information on
from 26 populations, looking for genetic variations that
2000 patients in their biobank to determine genetic
occur at a frequency of about 1%. The data will be free
aspects of bipolar disorder. It is hoped that data generated
for researchers and hosted on Amazon’s Web Services.
from this project will lead to earlier and better treatment
The project produced 200 terabytes of data during its
of this mental health disorder.41
active phase from 2008 to 2015.36
Informatics for Integrating Biology
Pediatric Cancer Genome Project and the Bedside (i2b2)
St. Judes Children’s Hospital-Washington University
i2b2 is a National Institutes of Health National Center
created this initiative to combat childhood cancer. Data
for Biomedical Computing initiative located at Harvard
generated from 800 subjects will be offered free to
Medical School. The Center has developed open source
researchers. As an example of a positive result from
software that will enable investigators to mine existing
this project, two gene variations associated with 50%
clinical data for research. Specifically, they will develop a
of low grade gliomas (brain tumors) were identified.37
scalable computational framework to speed up translation
of genetic findings into healthcare. There are multiple
Global Alliance for Genomics and Health member institutions, including international ones. The
In June 2013, a Global Alliance was formed to share project was designed to allow users to query a system-
genetic and clinical information. The Alliance was wide de-identified repository for a set of patients meeting
formed by 500 international medical and research orga- certain inclusion or exclusion criteria. On the web site,
nizations. They develop standards for sharing genetic users can download client-software, client-server soft-
information, information technology platforms with open ware and the source code.42 The i2b2 infrastructure has
standards and patient consent policies.38 been shown to be generalizable to multiple sites for a
range of clinical conditions.43
Pharmacogenomics Knowledge Base (PharmGKB)
The Observational Health Data Sciences and Informatics
This Stanford University based resource catalogues
(OHDSI)
the relationships between genes, diseases and drugs.
There are sections on drugs, pathways, dosing guidelines OHDSI is a multi-stakeholder, international collabo-
and drug labels. Information is downloadable.39 ration that aims to make the best use of available large
health data sets. Health data exist in a variety of file annual database issue that covers the myriad of data-
formats and often require standardization for both use bases available. And there’s also a web server issue that
and supporting data integration tasks. From the perspec- covers accessible resources that can be accessed over the
tive of translational bioinformatics, OHDSI represents a Internet. NCBI was created in 1988 and hosts dozens of
major initiative that provides datasets that can be inte- databases associated with biomedicine, including the
grated with biological data. For example, within the popular MEDLINE and GenBank databases. NCBI
context of studying drugs, it is useful to know reported provides access to sequences from over 500,000 organ-
adverse events. While the US FDA does make available isms (via GenBank), including the complete genomes of
data collected from its adverse event reporting system thousands of organisms (via NCBI Genome). Genomes
(FAERS), these data are challenging to use for large scale represent both completely sequenced organisms and
analyses. AEOLUS is an artifact of an OHDSI initiative those for which sequencing is still in progress. Popular
(LAERTES) that standardizes FAERS data such that it NCBI databases, which are linked by a common interface
can be used within a range of studies, including those (Entrez), are listed in Figure 18.4. On the Genome project
involving translational bioinformatics.44 web site one can search for specific genes or proteins
from different species. Figure 18.5 shows the result of a
All of Us search for the tumor protein TP53.
The NCBI site also provides access to BLAST+
Conceptualized at the end of the Obama administra-
(new Basic Local Alignment Search Tool) that enables
tion, the All of Us program (which is a key element of the
the identification of significantly related (based on a
Precision Medicine Initiative) is a White House initiative
“expectation” value or “e-value”) nucleotide or protein
to develop a US cohort of individuals that will support the
sequences from within the protein and nucleotide data-
development and evaluation of next-generation, person-
bases.47 Magic BLAST is a recently developed tool that
alized treatments. The All of Us program is built around
enables rapid searching for related sequences (e.g., as
the fundamental tenets of precision medicine, i.e., the
might arise from a full sequence of a human being) to a
right treatment for the right patient at the right time.
reference genome.48 Also available on the NCBI site are
In this way, the success of this endeavor, which is just
databases where one can find data about a gene (Gene),
formally launching at the writing of this text, will result
its location on the chromosome (MapViewer), its variants
in a paradigm shift in how biological data can be used to
(dbVar), and data about patients who have the gene or its
inform health care in a meaningful and efficient manner.45
variants (dbGaP).
The Cancer Genome Atlas (TCGA) GenBank
The suite of conditions that comprise cancer have a
This database was established in 1982 and is the NIH
strong genomic component. The TCGA project is a joint
sequence database that is a collection of all publicly avail-
initiative between the National Human Genome Research
able DNA sequences. Along with EMBL (Europe) and
Institute and the National Cancer Institute, both of the
DDBJ (Asia), GenBank is a member of the International
National Institutes of Health, that aims to provide one
Nucleotide Sequence Database Consortium (INSDC),
of the largest collections of cancer-related genomic data.
which provides free access to sequence data from nearly
Currently, the initiative focuses on 33 cancers and the
anywhere with an Internet connection. Interestingly,
TCGA provides access to genomic data associated with
many biological and medical journals now require
tumor samples gathered from more than 11,000 patients.46
submission of sequences to a database prior to publica-
tion, which can be done with NCBI tools such as BankIt.49
Bioinformatics Data and Information Resources
The bioinformatics community has produced a wide The Online Mendelian Inheritance in Man (OMIM)
variety of knowledge-based information resources that
Originally an NCBI database but now a standalone
not only provide access to research results but also facil-
resource, Online Mendelian Inheritance in Man (OMIM)
itate scientific discovery information from genomics.
is a database of genetic data and human genetic disorders.
Much of this activity is led by the National Center for
It was originally developed by Johns Hopkins University
Biotechnology Information (NCBI), which is part of the
and Dr. Victor McKusick, a pioneer in genetic metabolic
National Library of Medicine. A catalog of NCBI and
abnormalities. It includes an extensive reference section
other resources is provided annually in the open-ac-
linked to PubMed that is continuously updated.50
cess journal Nucleic Acids Research. In fact, there is an
Figure 18.4: NCBI Databases (Courtesy National Library of Medicine)
Figure 18.5: Entrez search for tumor protein (Courtesy National Library of Medicine)
World Community Grid outbreaks of infectious diseases such as avian influenza.52

Not all such initiatives have been successful. Perhaps the
This project was launched by IBM in 2004 and simply best known is DeCODE Genetics Corporation, which
asked people to donate idle computer time. By 2007 aimed to collect disease, genetic and genealogical data
over 500,000 computers were involved in creating a for the entire population of Iceland; however, it filed for
super-computer used in bioinformatics. Projects include chapter 11 bankruptcy in 2009.53 Nonetheless, DeCODE
Help defeat Cancer, Fight AIDS@Home, Genome continues the development of personal genomics based
Comparison and Human Proteome Folding projects. This solutions, largely in partnership with organizations such
grid promises to greatly expedite biomedical research as Pfizer.
by analyzing complex databases more rapidly because Decreasing Cost of Human Genome Determination:
of this grid.51 Coinciding with the completion of the HGP, the NHGRI
has kept track of the cost to perform DNA sequencing of
PERSONAL GENOMICS an entire human genome over the past decade. As Figure
18.6 indicates the cost has dropped from an initial cost
The availability of population-based genetic data, the of $100,000,000 to a current cost of less than $1,000 per
decreasing cost for human genome determination and genome in 2017. Notably, the decrease in cost of genome
the availability of commercial personal genetic testing sequence is exceeding Moore’s Law (attributed to Intel
companies provide greater personal uses of genomic data. co-founder Gordon Moore, and states that the cost of
Population Studies: There are several ongoing initia- computing power will be halved every 18 months based
tives that will leverage genomic data in the context of on advances in technology).11
population studies. For instance, Oracle Corporation has Personal Genetics Testing. Many patients may want to
partnered with the government of Thailand to develop a know their own genetic profile, even if the consequences
database to store medical and genetic records. This initia- are uncertain. The following are examples of personal
tive was undertaken to offer individualized “tailor made” genetics companies (“direct to consumer genomics”):
medications and to offer bio-surveillance for future
Figure 18.6: Cost per Genome over time (Courtesy National Human Genome Research Institute, National
Institutes of Health)11
• AncestryDNA is a separate service offered by specificity of genetic tests within clinical contexts will
Ancestry.com. Their analysis will determine eth- be essential for them to be accepted. In general, patients
nicity estimates and will identify remote relatives. may not be willing to undergo major procedures (e.g.,
Saliva samples are needed, the cost is $79 and the a prophylactic mastectomy or prostatectomy to prevent
turnaround time is six to eight weeks.54 cancer) unless the genetic testing is nearly perfect. It is
• 23andMe is a direct to consumer online genetic also important that genetic counseling be available to help
testing company. For $199 they send a testing kit patients understand the implication of genetic suscep-
to homes based on analyzing saliva with a turn- tibility tests (versus genetic guarantee of disease, such
around time of four to six weeks. Currently, they as the mutations associated with Huntington’s disease).
look for 240 diseases, multiple carrier states and drug Additionally, the Genetic Information Nondiscrim-
response conditions (a substantial increase in the last ination Act of 2008 was passed to protect patients
two years). They also offer an analysis of ancestry against discrimination by employers and healthcare
based on the genetic profile.55 In 2010 a genome wide insurers based on genetic information. Specifically, the
association study (GWAS) was published that used Act prohibits health insurers from denying coverage
this technology and showed that patient question- to a healthy individual or charging that person higher
naire results correlated well with genetic results. premiums based solely on genetic information and bars
Additionally, they were able to describe five new employers from using individuals’ genetic information
genotype-phenotype associations: freckling, photic when making decisions related to hiring, firing, job
sneeze reflex, hair curl and failure to smell aspar- placement, or promotion.61
agus.56 Google’s co-founder Sergey Brin was one Many obstacles face the routine ordering of genetic
of the early funders of 23andMe, focusing on a tests by the average patient. Ioannidis et al. pointed out
project through this company to study the genetic that for genetic testing to be reasonable several facts must
inheritance of Parkinson’s disease. They hope to be true. The disease of interest must be common. Even
recruit 10,000 subjects from various organizations with breast cancer, when seven established genetic vari-
and offer a discount price for complete analysis. In ants are evaluated, they only explain about 5% of the risk
late 2013 the FDA instructed the company to stop for the cancer. If the disease (e.g., Crohn’s disease) is rare,
performing genetic analyses for medical conditions then the test must be highly predictive. For genetic testing
until they receive 510(k) (pre-market) clearance, to be relevant one should have an effective treatment to
which they subsequently received.57 In 2017, they offer, otherwise there is little benefit. The test must be
offered a Health and Ancestry analysis for $199 cost effective, as many currently are too expensive. As
and an Ancestry analysis for $99. The Health an example, screening for sensitivity to the blood thinner
analysis includes genetic risk (e.g. celiac disease, warfarin (Coumadin) makes little sense now due to cost.62
macular degeneration, etc.), wellness reports, trait A 2010 Lancet journal commentary warned of addi-
reports (e.g. eye color, skin pigment) and carrier tional concerns. Whole-genome sequencing will generate
status reports (e.g. polycystic kidney disease, cystic a tremendous amount of information that the average
fibrosis, etc.). physician and patient will not understand without exten-
• Myriad™ specializes in genetic testing for cancers sive training. At this point, health care lacks adequate
with a hereditary component, such as breast, endo- numbers of geneticists and genetic counselors that under-
metrial, melanoma, ovarian, colon, prostate, gastric stand the implications of data being made available
and pancreatic cancer.58 A sentinel Supreme Court thanks to continued advances in biotechnology. Patients
decision took place in 2013 that determined that will need to sign an informed consent to confirm that
Myriad could not patent BRCA gene testing.59 many of the findings will have unclear meaning. They
As pointed out by Harold Varmus (American Nobel- will have to deal with the fact that they may be found to
prize winner, who was a former director of the NIH, and be carriers of certain diseases that may have impact on
the current director of the NCI), personal genetics “is not childbearing, etc. Genetic testing may cause many further
regulated, lacks external standards for accuracy, has not tests to be ordered, thus leading to increased healthcare
demonstrated economic viability or clinical benefit and expenditures. As more information about whole-genome
has the potential to mislead customers.”60 For genetics to sequencing is gained, more patients will desire it but who
enter the mainstream, new technologies and specialties will pay for it? And, can the costs be justified?63
will need to be developed and numerous ethical ques- Two other articles drive home additional practical
tions will arise. Just finding the abnormal gene or SNP points. When the risk of cardiovascular disease based
is the starting point. Understanding the sensitivity and on the chromosome 9p21.3 abnormality was evaluated

HoytEtAl 2018 BioInformatics

Uploaded by

Copyright:

Available Formats

You might also like

HoytEtAl 2018 BioInformatics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HoytEtAl 2018 BioInformatics

Uploaded by

Copyright:

Available Formats

18

ROBERT E. HOYT • WILLIAM R. HERSH • INDRA NEIL SARKAR

by contrast ULC techniques may be more sensitive but IMPORTANCE OF TRANSLATIONAL

Figure 18.3: Chromosome 12 (Courtesy of the National Library of Medicine)

Human Variome Project Framingham Heart Study SHARe Genome-Wide

Figure 18.4: NCBI Databases (Courtesy National Library of Medicine)

World Community Grid outbreaks of infectious diseases such as avian influenza.52

You might also like