Professional Documents
Culture Documents
ch4 Visualizing Personal Genomics
ch4 Visualizing Personal Genomics
C H A PT ER 4
Key points
• Visualization is a critical aspect of personal genomics due to the quantity and richness of the data.
• Visualization modalities can both enhance and limit the interpretation of personal genomic information.
• Different visualization techniques are used to facilitate distinct aspects of exploring personal genomic information; e.g.
visualizations for clinical interpretation will be distinct from those facilitating biological discovery and interpretation.
Exploring Personal Genomics. First Edition. Joel T. Dudley and Konrad J. Karczewski.
© Joel T. Dudley and Konrad J. Karczewski 2013. Published 2013 by Oxford University Press.
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 53
Elevated Risk ?
Ankylosing Spondylitis
Figure 4.1 Tabular view of genetic disease risk: This tabular view of genetic disease risk is generated as part of the analysis provided by a DTC
personal genomics service. This view offers fairly concise summarization of increased disease risk assessed from personal genomic information,
which are clearly ranked according to overall risk estimates. However, this view offers limited interactivity, and does not reveal relationships
between individual diseases (e.g. comorbid conditions or shared risk alleles). Image from 23andMe genetic testing service.
techniques will make continued progress, we hope vide links to drill-down inon a particular line item
that readers of this book, from all backgrounds and (e.g. disease) to view the underlying data, and per-
capabilities, might become inspired by the informa- haps additional visualizations.
tion in this chapter to add their own unique contribu- Tabular views are also the predominant visuali-
tions to personal genome visualization. zation technique employed by many third-party
genome interpretation tools. Both the GET-Evidence
tool from the Personal Genome Project, and the
4.2 Tabular views shareware Promethease tool from the makers of
Although tabular views, where information is SNPedia use tabular views as their primary visuali-
organized into a table-like format, may seem a bit zation technique. Like many DTC web interfaces,
pedestrian from the standpoint of data visualiza- Promethease uses tabular views to provide a high-
tion, they are used frequently to present personal level summary of annotated personal genomic
genomic information. Customers of direct-to-con- information; however, nested tables are used to
sumer (DTC) genomics services are typically greeted reveal underlying information. GET-Evidence uses
with tabular views of their data when they view tabular views to present a more fine-grained view
their results using the web interface provided by the of personal genomic data, presenting annotations
DTC company (Figure 4.1). These interfaces often relating to the gene in which a particular locus is
use tabular views to provide a high-level summary found in or around, and information related to the
of annotated personal genomic information, such as population frequency and clinical importance of
a list of diseases with calculated genetic risk based the genotype along with free-text commentary
on an individual’s personal genome, and they pro- (Figure 4.2).
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
Variant report for huE80E3D (PGP4: Misha Angrist) CGI var file, build 36
· Name: huE80E3D (PGP4: Misha Angrist) CGI var file, build 36
· This report: evidence.personalgenomes.org/genomes?fe9f72be9699820adc9af9e001500e02189adc84
· public profile: my.personalgenomes.org/profile/huE80E3D
· Download: source data (373 MB), dbSNP and nsSNP report (126 MB)
· Show debugging info
CPT2- High Well- 0.78% This is the most common variant associated with late-onset
S113L established carnitine palmitoyltransferase deficiency, which is
pathogenic classically viewed as recessive. Many patients are
heterozygous for this, but are presumably compound
Recessive, heterozygous.
Carrier
(Heterozygous)
TREM2- High Uncertain 0.78% Unreported, predicted to be damaging. Other recessive
R47H pathogenic mutations in this gene cause polycystic lipomembranous
osteodysplasia with sclerosing leukoencephalopathy (a
Recessive, severe genetic disorder, usually lethal by age 50).
Carrier
(Heterozygous)
CC2D2A- High Uncertain 0.78% Unreported, predicted to be damaging. Other recessive
G776R pathogenic mutations in this gene cause Joubert Syndrome and Meckel
Syndrome.
Recessive,
Carrier
(Heterozygous)
SPG11- High Uncertain 0.78% One unpublished report links this to causing spastic
K1013E pathogenic paraplegia in a recessive manner, but insufficient data
exists to evaluate significance. Most mutations in this
Recessive, reported to cause the disease in this gene are more severe
Carrier null mutations (frameshift or nonsense).
(Heterozygous)
Figure 4.2 Tabular view of rare pathogenic variants: The tabular view of personal genomic information provided by the GET-Evidence report
offers the ability to interactively sort personal genomic information according to various properties or annotations. This sorting functionality goes
beyond the functionality typically provided by DTC genomics services. However, the heavy reliance on text limits the effectiveness of this visual
representation. Image from GET-Evidence report.
Despite their lack of stimulating aesthetics, there train schedules to sports scores, our perception
are a number of pragmatic reasons why these com- has been well conditioned to quickly scan, inter-
panies and third-party tools have chosen to use pret, and assess information represented in this
tabular views: way. Given any tabular representation of per-
sonal genomic information, most individuals
• Familiarity: A quick scan of any popular maga- would automatically understand how to scan
zine or daily newspaper will reinforce the notion the header column of the table to read the types
that modern humans are generally familiar with of data in each column, and would innately
information presented in a tabular format. From understand that the data on each row are related
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 55
to a single item (e.g. information related to a sin- limited for visualizing personal genomic
gle SNP). information:
• Summarization: Although tables can be used
• Limited ability to show relationships: Perhaps
in numerous ways, they are commonly used
one of the most striking limitations of tabular
to summarize information, and therefore most
views for visualizing personal genomic infor-
individuals are conditioned to perceive sum-
mation is the limited ability to show relation-
mary data using a tabular representation. One
ships between row entities. Nested tables are
of the most common uses of tabular summari-
sometimes used to imbue hierarchy into tabu-
zation is the representation of frequency tables,
lar views, but this approach can quickly clut-
where each row typically represents a count or
ter a table, and cannot be applied when there
proportion of the occurrence of some value in a
are many-to-many relationships between enti-
particular group or interval (e.g. the percentage
ties. To illustrate this limitation in the context
of individuals who responded “yes” or “no” to
of personal genomics, take a look at the tabu-
particular questions in a survey). Tabular views
lar representation of disease risk presented by
are frequently used by DTC companies in their
the 23andMe interface (Figure 4.1). In this case,
user interfaces to summarize quantitative (e.g.
each disease is represented as a discrete row;
combined genetic disease risk) or categorical (e.g.
however, what are not shown are the known
presence or absence of a trait) properties of a per-
relationships between the diseases which can
sonal genome.
be important for evaluating disease risk. For
• Sorting/ranking: Another familiar or intuitive
example, Type 2 diabetes is known to be a
property of tabular views is the notion of sorting
risk factor for myocardial infarction (i.e. heart
or ranking the rows according to values in one
attack), yet this important disease interaction
or several of the columns. Most readers will
is lost in the tabular representation. In fact, the
be familiar with the tabular views utilized by
majority of interfaces provided by DTC compa-
online merchants that typically allow items to be
nies and third-party tools succumb to this same
sorted according to price. We can just as easily
limitation due to their use of tabular views for
imagine sorting or ranking tabular summariza-
representing disease risk.
tions of annotated personal genomic information
• Limited perception of trends: While tabular
according to the odds ratio or allele frequency of
views might provide the most accurate repre-
a particular SNP. The ability to sort or rank data
sentation of the underlying data, in that they
in a tabular view adds another dimension along
are often used to display “raw” or minimally
which the data can be compared.
processed data, the limitations of human per-
• Transparency: Of course, due to pervasive use
ception can constrain what can be effectively
of spreadsheet paradigms and row-based data-
communicated using tabular views. In particu-
bases, data is often stored using tabular repre-
lar, human perception is decidedly poor in its
sentations. Therefore tabular representations are
ability to distinguish trends from tabular data
often the most straightforward way to display
representations. Although human perception
“raw” data underlying a personal genome, or
is attuned to perceiving rank order in sorted
cross-reference this information with other rel-
tables, it is quite limited in its ability to assess
evant data tables (e.g. a look-up table containing
the possible magnitude or slope of a trend, or
disease-associated SNPs). One example of this
the magnitude of the difference between trends
is the GET-Evidence Variant Report tool which
in a multidimensional table. For example, as
displays rows of metadata for annotated variants
shown in Figures 4.3 and 4.10, the dramatic
that are found in a personal genome by cross-
effect of SNP selection on running genetic risk
reference.
is much more apparent in the graphical repre-
Despite these positive attributes, there are a number sentation of the data in comparison to the tabu-
of reasons why tabular views may be insufficient or lar representation.
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
22
13
4
Prior 9295475 9460546 9465871 2792248 7903146 985694 1884051
2283228 6769511 4376068 9939609 7901695 3020317 726281 3753242
SNP Index (Ordered By Study Size)
Running LR Prior
Figure 4.3 Likelihood ratios plot juxtaposed with underlying tabular data: The 2-D graph displays the Running Likelihood Ratio (LR) quantity
from the table below the graph. Although both the graph and the table represent the same basic Running LR data, the trend of this data is much
more apparent in the graph representation. In this case, the graph readily reveals that the included SNPs in the risk model first cause the overall
risk estimate to fall below the population average (red line), but then to trend above the average as additional SNPs are added to the model. This
trend is perceptible, although less apparent in the tabular representation of the data.
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 57
tively inefficient in doing so compared to many how this relates to the structure and function of
other tasks of perception. One needs to simply cells (Figure 4.4). Early efforts in karyotyping
open a newspaper to a full page of stock quotes, used a technique called G-banding, in which
or stare at a large spreadsheet of numbers to feel chromosomes are stained with a chemical solu-
the weight of the cognitive load that such repre- tion that will cause dark bands to appear at cer-
sentations impart. Therefore, tabular views may tain locations in the chromosomes, which are
cause the viewer to expend comparatively large related to A-T rich regions and contain fewer
amounts of time and energy to interpret if they functional gene regions. Many digitally gener-
are sufficiently large or complex. In addition, ated karyotypes (i.e. ideograms) will shade chro-
humans have particularly low attention thresh- mosomes according to locations of known genes
olds in perceiving such data, and often prefer to simulate this G-banding. However, more mod-
to simply focus on the top-most portion of the ern cytogenetic techniques, such as fluorescent in
data. This tendency is captured in the behavior situ hybridization (FISH), can offer more detailed,
of people who use web search engines, where and colorful, representations of chromosomal
studies find that more than 90% of web search- structure and organization.
ers tend to click links found in the first set of In addition to their practical use for studying
search results presented, despite the fact that chromosomes for genetic research, karyotypes have
relevant links might be found in the next several also been used as diagnostic aids in medicine to
pages of search results. These variant reports detect and diagnose chromosomal abnormalities in
demonstrate these limitations using personal individual patients. In this regard, a clinician will
genomic information (Figure 4.3). Most view- typically scan an individual’s karyotype to look for
ers will have a tendency to focus on the first few signs of aneuploidy, or an abnormal number of
rows displayed in the table and use the sorting chromosomes. One of the most widely known
feature to adjust this view, rather than scroll anueploidies in humans, especially for those who
down and view all the rows presented by the have experienced prenatal medical care, is trisomy
viewer. 21 (i.e. having three copies of chromosome 21),
which is the primary etiological factor underlying
Although tabular views are commonly used for
Down’s syndrome. Trisomy 21 is a congenital aneu-
visualizing personal genomic information, we hope
ploidy because it is present from conception. Other
that readers will understand the strengths and
common congenital aneuploidys in humans include
weaknesses of these views as they interpret their
Edward’s syndrome (trisomy 18), Patau syndrome
own personal genomic information using estab-
(trisomy 13), and Turner syndrome (monosomy of
lished tools, or embark on the development of their
the X chromosome in females). Somatic aneuploidy,
own visualizations.
i.e. chromosomal abnormalities post-birth to adult-
hood, are also observed in many cancer cells—par-
4.3 Ideograms
ticularly in the hematological cancers known as
Ideogram representations are used to display leukemias. Because full genome sequencing is not
iconographic maps of genomic information using yet commonly performed on infants or to profile
the visual metaphor of chromosomal organiza- individual patient tumors, idiographic representa-
tion. Because the term karyotype is also used to tions of individual karyotypes from personal
refer to a visual representation of chromosomes genomic information typically conform to the
and their organization, ideograms are sometimes “normal” arrangement of human chromosomes.
called karyograms or digital karyotypes (if com- However, there are methods for detecting large
puter generated), or virtual karyotypes. Karyo- structural variations in personal genomic informa-
type representations are born out of a subfield of tion (discussed in Chapter 11) and methods to visu-
genetics called cytogenetics, which is largely con- alize these variations are likely to become an
cerned with the study of relationships between important aspect of visualizing spatially the per-
chromosomal structure and organization, and sonal genomic information.
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
(a) (b) 24
13.3 22 23
13.2 13.1
12.3 12.2 21
12.1 13
11.2
11.1 11 12
12 11
13.1 12
13.2 13.3 13
14 21.1
15 21.2
21.1 21.2 21.3
21.3 22.1
22.2 22.3
22
23 31
24.1 33 32
24.2 24.3 34.1
34.2 34.3
12 9
(c)
15.3 15.2
15.1
14
13
12 11
11.2 11.1
12 13
13 12
14 11.2
15 11.1
21 11.2
12 13.1
22 13.2
23 13.3
31
32
33
34
35
5 20
(d) 15
14 13.3
13 13.2 13.1
12.3 12.3 12.2
12.1 12.2 12.1
11.2 11.2 11.1
11.1 11
11.2 12
21.2 13.1
21.2 13.2 13.3
21.3 14
22.2 22.1 15
22.3 21.1 21.2
23.1 23.2 21.3
23.3 22
24.1 24.2 24.3 23
25.2 25.1
25.3 24.1
26.2 26.1 24.2 24.3
26.3
10 12
Figure 4.4 The biomimetic basis of ideograms: Ideograms are digital graphic representations of chromosomes that are inspired by the classical
cytogenetic techniques for karyotyping chromosomes using chemical stains. The dark bands used to distinguish regions of an ideogram are analogs
of the G-banding pattern that is observed as the result of karyotype staining of human chromosomes. This figure shows micropictographs of
several G-banded human chromosomes next to their ideogram analogues. Reproduced from Harada, N. et al. Subtelomere specific microarray
based comparative genomic hybridization: a rapid detection system for cryptic rearrangements in idiopathic mental retardation. Journal of Medical
Genetics 41, 130–6 (2004) with permission from BMJ Publishing Group Ltd.
From the standpoint of personal genomics, the • Biological organizing principle: Ideograms
primary utility of ideogram representations is to organize personal genomic information that is
visualize patterns of personal genomic information analogous to the manner in which an individu-
across the genome, and to orient exploration of per- al’s DNA is organized in their cells. This allows
sonal genomic information (Figure 4.5). The primary for visual orientation around important bio-
benefits of using ideograms to visualize personal logical features of chromosomal DNA, such as
genomic information include the following. centromeres and telomeres. Additionally, recall
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 59
1 2 3 4 5 6
247M Bases 242M Bases 199M Bases 191M Bases 180M Bases 170M Bases
2815 Genes 1860 Genes 1497 Genes 1136 Genes 1276 Genes 1575 Genes
81k SNPs 81k SNPs 66k SNPs 57k SNPs 58k SNPs 72k SNPs
7 8 9 10 11 12
158M Bases 146M Bases 140M Bases 135M Bases 134M Bases 132M Bases
1512 Genes 1013 Genes 1245 Genes 1105 Genes 1850 Genes 1370 Genes
56k SNPs 51k SNPs 44k SNPs 52k SNPs 51k SNPs 49k SNPs
13 14 15 16 17 18
114M Bases 106M Bases 100M Bases 88M Bases 78M Bases 76M Bases
545 Genes 1315 Genes 998 Genes 1098 Genes 1494 Genes 429 Genes
38k SNPs 32k SNPs 31k SNPs 31k SNPs 30k SNPs 29k SNPs
19 20 21 22 X Y
63M Bases 62M Bases 46M Bases 49M Bases 154M Bases 57M Bases
1743 Genes 768 Genes 360 Genes 749 Genes 1376 Genes 284 Genes
19k SNPs 24k SNPs 14k SNPs 15k SNPs 30k SNPs 5k SNPs
MT
16k Bases
37 Genes
4k SNPs
Figure 4.5 Using ideograms for visual orientation: Commercial DTC genomics companies will often employ ideogram representations in their
products to enable the visualization or browsing of personal genomic information guided by the biologically organizing principal of chromosomal
arrangement. The top graphic demonstrates the use of ideograms to facilitate the navigation of the raw genotype data for an individual. This view
enables an individual to click on a chromosome to begin exploring the personal genotyping data available for the selected chromosome. The
bottom graphic demonstrates the use of ideograms for organizing the results of analysis based on personal genomic information. In this case, the
ideogram is color coded to represent the predicted ancestral origins of each chromosomal region for a personal genome. Image from Lumigenix
genetic testing service.
Karyotype
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 X Y
Chromosome
11 , Start: 1, End: 135006516, Length: 135006516
p15.2 p14.1 p11.12 q11 q13.1 q13.5 q21 q23.1 q24.2
p15.4 p14.3 p12 q12.2 q13.3 q14.2 q22.2 q23.3 q25
Gene Names
DRD2
Cancer Mutations
Son
Mom
Dad
Figure 4.6 Linked ideogram visualization: The browsing interface provided by the myKaryoView tool uses ideogram representations at multiple levels of detail to enable interactive browsing
and navigation from the broader genome level to individual chromosomes, and the individual personal genome sequence level data of individuals. All three levels are linked such that changes
in one level of detail, such as a shifting of focus to a different chromosome at the top level, are reflected with appropriate changes at all three levels of detail. Image from myKaryoView tool.
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 61
that variants that are nearby are often inherited obscure or limit the primary purpose of per-
together in a process known as linkage disequi- sonal genome visualization, which is to convey
librium (LD; see Chapter 1). To this end, variants important information about a personal genome.
in neighboring genes can be quickly visualized Therefore, factors such as the intended audience
using ideograms. and the nature of the information being commu-
• Spatial orientation: Many genetic studies report nicated should take precedence over biological
findings using a standardized nomenclature to accuracy.
refer to a particular genomic region, such as • Limited resolution: One of the main benefits
a region associated with a particular disease of the ideogram representation mentioned pre-
mechanism. For example, eye color and skin pig- viously, is that it provides a relatively compact
mentation traits are associated with the region and succinct means to visualize the tremendous
5p13.2 (See Chapter 1). An ideogram makes it quantity of information in a personal genome.
easy to use this genomic coordinate nomencla- However, as with most visualizations, this ben-
ture to easily navigate to relevant regions on a efit comes as a trade-off that can limit other
personal genome. aspects of the visualization. In this case, the
• High information density: One major benefit compact representation of the ideogram limits
of an ideogram representation is that it offers a the resolution at which information can be dis-
compact means to visualize the entire breadth of played. Therefore, ideograms may be useful for
information contained in the over 3 billion base showing gross patterns along a chromosome,
pairs represented in a personal genome. such as recombination patterns, but insufficient
• Standardization: Linear representations of for showing details that require base-pair level
chromosomes are often used in genomics lit- resolution, such as the absence or presence of a
erature, such as the Manhattan plots often trait-associated allele.
found in genome-wide association studies
The pervasiveness of ideograms in genome visuali-
(GWAS). Ideograms are effective tools for visu-
zation is largely driven by the advantages that this
alizing these data as they map directly to linear
representation provides. However, this visualiza-
representations.
tion technique should only be implemented after
Despite these positive aspects of ideograms, their careful consideration for the purpose of the visuali-
frequent use for representing genomic information zation, the intended audience, the nature of the
should not serve as a sign that they should be con- information being communicated, and the known
sidered as necessary for genomic visualization. limitations. Some of the limitations of ideograms
There are several reasons why ideograms may not can be compensated for by using it in combination
be appropriate for visualizing personal genomic with other visualization modalities, such as the
information. brushing and linking technique used by myKaryo-
View to connect visualizations representing differ-
• Lack of general familiarity: While the chromo-
ent levels of detail (Figure 4.6).
somal metaphor embodied by ideograms may be
intuitive for those who have formal training in
genetics, it is unlikely for amateurs, hobbyists, or
4.4 Genome browsers
lay individuals to find them familiar or intuitive.
Even many medical professionals are unlikely to It could be easily argued that the visualization
find familiarity or be comfortable with ideogram modality known as the genome browser repre-
representations. sents the ancestor of most modern approaches to
• Biological analogy not always beneficial: visualize genomic information. When researchers
Although ideograms organize personal genomic needed a means to provide access to and dissemi-
information in a manner that is analogous to nate the initial draft data for the Human Genome
the biological organization of DNA in the cell, Project, a team at the University of California
such efforts to maintain biological accuracy may Santa Cruz (UCSC) developed what has become
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
Figure 4.7 Linear genome browser with multiple tracks of genomic information. Linear genome browsers, such as the venerable UCSC Genome Browser shown in this graphic, typically enable
rich, multidimensional views of genomic information by providing the ability to add synchronized “tracks” of information along with a genome sequence. These tracks can contain additional
genomes for visual comparison, or other sources of relevant genomic information, such as the locations of disease variants, measures of evolutionary conservation, or functional genomic
information such as transcription factor binding regions. Image from UCSC Genome Browser.
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 63
one of the most widely used and recognized tools zations can engage the powerful pattern matching
in genomic research, the UCSC Genome Browser. capability of human perception to enable the dis-
The UCSC Genome Browser provides a visualiza- covery and visualization of informative patterns
tion interface that renders the genome sequence in genomic information.
data as a linear sequence at varying levels of gran- Linear genome browsers provide an effective visual
ularity and allows the viewer to navigate along framework for several important aspects of exploring
the length of the sequence, jump to specific loca- and interpreting personal genomic information:
tions in the genome using genomic coordinates, or
zoom in and out do varying levels of detail (Figure • Browsable access to raw data: Because linear
4.7). Since its initial release, the UCSC Genome genome browsers typically incorporate stand-
Browser has evolved into a generalized tool for ard genomic coordinate systems (e.g. based on
visualizing nearly any type of genomic informa- a reference genome), they can offer a conven-
tion, including the genomes of other species, and ient means for browsing or searching along
annotative data such as the locations of polymor- personal genome sequence data. Further-
phisms or transcription factor binding regions. more, most linear genome browsers employ
Since the inception of the UCSC Genome Browser, a straightforward biological organizing prin-
many additional genome browsers have been cre- ciple, in which the data is typically organized
ated using the same visualization paradigm as the by their linear orientation along human chro-
UCSC Genome Browser, giving way to an entire mosomes. This makes it straightforward to
class of linear genome browsers. navigate to regions containing specific genes,
or other functional elements, whose chromo-
somal locations and exact reference coordi-
4.4.1 Linear genome browsers
nates are typically known a priori.
Linear genome browsers present genome sequence • Integrative assessment: One of the main ben-
data along a linear coordinate system that allows efits of linear genome browsers is the ability to
for viewing and navigation of the sequence data render integrated views of genomic information
long the entire length of the sequence at varying in which different data types are rendered in par-
levels of detail. One of the core features of most allel tracks along a common coordinate system.
genome browsers, and one of their primary This paradigm can be useful for mostly qualitative
strengths, is to visualize and navigate multiple functional assessment of personal genomic infor-
tracks of data in parallel with the genome sequence mation, where functional genomic information
data. These tracks can incorporate any type of can be rendered in parallel to a personal genome
information that can be mapped to a genomic sequence to visually assess possible functional
coordinate system, such as the location of disease- consequences of personal genome variation. For
associated SNPs, or the relative evolutionary con- example, a personal genome might be visualized
servation of positions across mammalian genomes. in a linear genome browser along with one par-
The benefit of this approach is that enables the allel track showing population allele frequency
visualization of multiple forms and dimensions information, another track showing the strength
of genomic information across parallel tracks in of transcription factor binding activity measured
the same field of view. Each individual track can from some population reference, and another
employ any type of visualization of glyph that fits showing a measure of the conservation of each
on to the genomic coordinate system, and can position across mammals (Figure 4.7). Visual
scale appropriately as the user zooms in and out inspection of these tracks might quickly iden-
of varying levels of details. For example, the rela- tify positions where an individual harbors a rare
tive evolutionary conservation of positions or (< 1% MAF) or personal variant allele at a position
regions might be represented by a histogram to that is both highly conserved across mammals—
convey the relative degree of conservation (Fig- and therefore likely to be functional important—
ure 4.7). If designed appropriately, these visuali- and located within a peak region of transcription
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
factor binding activity upstream of a gene. Such • Inability to visualize non-linear interactions:
positions or regions might be rapidly identified While a linear representation of a genome serves
by visual inspection in a genome browser, and as a straightforward and intuitive paradigm for
further investigated using quantitative or experi- exploring and visualizing a personal genome, it is
mental methods for functional assessment. far removed from the actual biological context of a
• Comparative assessment: Because linear genome genome, which is coiled around histone proteins and
browsers are able to lay out multiple personal packaged into chromatin in the cell nucleus. Within
genome sequence along parallel tracks that are the nuclear environment, genomic DNA participates
spatially synchronized along a common coordinate in a large number of non-linear interactions, such as
system, they are useful for facilitating visual com- long-range chromatin interactions between non-
parative analysis of personal genome sequences. coding regulatory regions and gene-coding regions
There are a number of scenarios in which com- facilitated by transcription factors. Genomic regions
parative analysis of personal genomes may be that are distant in linear terms (i.e. many megabases
useful and informative. For example, a scenario in apart) may actually be relatively proximal within
which there is a need to assess differences between the chromatin structure of the cell. Such interactions
the somatic DNA sequence of an individual can- can be difficult to visualize or interpret using linear
cer patient, and the complete DNA sequence of representations of a personal genome.
the patient’s tumor genome. In this scenario, a • Cognitive overload: Linear genome browsers
pathologist might use a linear genome browser to excel at facilitating comparative analysis of local
compare the sequences at a low level of detail to genomic regions (i.e. coding region containing
scan for large structural differences (e.g. segmental upstream promoter and exons for a single gene)
insertions or deletions) between the two genomes, across parallel tracks of information, however this
or zoom in to a finer level of detail to scan for modality requires the user to specify a specific
single-nucleotide variants in known oncogenes. genomic region of interest a priori. Linear genome
Another example is the comparative assessment of browsers are less useful for ad hoc browsing of
the personal genomes of family members, where a personal genome to identify gross patterns of
the family is suspected of harboring rare disease interest, because the density of information across
variants. Although pedigree analysis is typically even a single chromosome can quickly overload
used for such scenarios, a linear genome browser the cognitive machinery of human perception,
might be used to visually identify genomic regions limiting its ability to identify meaningful patterns
with obvious differences between affected and in the visual data.
unaffected individuals in the family.
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 65
Interchromosomal
rearrangement Point mutation
1
X
22
21
2
20
19
18
17 3
16
4
15
14
5
13
12 6
11
7
10
8
9
Intrachromosomal
rearrangement Copy-number change
Figure 4.8 Circular genome graphic displaying multiple layers of information for a cancer genome. Circular representations of genomic
information, such as the Circos plot of cancer genome information shown in this figure, allow for dense representations of genomic information
that are still easily interpreted by human perception. Like linear genome browsers, multiple tracks of information are used to add various data
dimensions to the visualization. However, circular representations allow for the representation of long-range relationships, such as the interchro-
mosomal rearrangements represented by the connecting segments in the center of the graphic here (see also Plate 2). Reprinted by permission
from Macmillan Publishers Ltd: Ledford, H. Big science: The cancer genome challenge. Nature 464, 972–4 (2010).
implementation of this paradigm is the Circos ability to visualize interactions or other types of
software package (<http://circos.ca>), which relationships between spatially distant regions,
has been used to render genomic information such as regulatory or physical (e.g. protein-
in numerous high-profile scientific and popular protein) interactions, succinctly. Additionally,
media publications (Figure 4.8). Other notable circular compositions are capable of rendering
tools employing the annular paradigm include greater information densities, and are capable of
Mitowheel (<http://mitowheel.org>), DNA- simultaneous display of variable resolution data,
Plotter <(http://www.sanger.ac.uk/resources/ where low-resolution representations (e.g. chro-
software/dnaplotter/>), and CGView (<http:// mosomal organization) can be displayed near the
wishart.biology.ualberta.ca/cgview/>). Similar circular center, and higher-resolution information
to the linear genome browsers, these browsers (e.g. exons or SNP locations) can be displayed
can render parallel tracks along the same coor- coordinately along circular tracks at outer radii.
dinates as the primary genome track, which are • Cartographic genome browsers: Building on the
rendered as concentric circular tracks. Circular paradigm of traditional geospatial maps, such as
compositions offer a number of advantages over the popular Google Maps web tool, cartographic
linear representations. Chief among these is the genome browsers allow for navigation and visu-
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
Figure 4.9 Custom genome browser based on the JBrowse software client. There are several projects aimed at providing genome browser
frameworks that can be used to serve as the basis for customized genome browsers. The genome browser shown in this figure was adapted from
the open-source JBrowse genome browser to provide a customized browser for the GenomesUnzipped personal genomics community. Image from
Genomes Unzipped personal genome browser.
alization of personal genomic information along sent the entire genomic sequence across a plane,
a planar coordinate system. Whereas linear and allow for 360° navigational freedom at vary-
genome browsers typically allow freedom of nav- ing levels of detail (i.e. zoom). To use the tradi-
igation bilaterally along the length of a genome tional roadmap analogy, linear genome browsers
sequence, cartographic genome browsers repre- are like a constrained map interface that would
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 67
only allow you to visualize and navigate along 4.5 Visual quantitative assessment
a single road or highway at a time, whereas a
cartographic genome browser is like a fully func- Although genomic data visualizations are often
tional map browser that allows you to navigate in designed to facilitate the display of summary infor-
any direction across a city or country. Examples mation (e.g. a list or location of variants in a given
of cartographic genome browsers include DNA gene), or to identify general patterns (e.g. locations
Guide (<http://dnaguide.com>) and Genome of genomic rearrangements), visualization tech-
Projector (<http://www.g-language.org/g3/>). niques can also be used to facilitate quantitative
assessment and interpretation of personal genomic
information. Visualizations designed for quantita-
tive assessment differ from other visualizations in
4.4.3 Building custom genome browsers that their purpose is to guide or otherwise enable
Although there are numerous genome browsers the viewer to determine some accurate quan-
available, and many as free downloads or public tity through aid of graphical devices. Careful con-
web tools, there are likely to be many present and sideration must be taken in the design of such
future scenarios in which the existing genome visualizations to ensure that data are represented
browsers lack some necessary functionality. This accurately, and that each graphical element serves a
is likely to be especially true for scenarios involv- clear purpose, so that ambiguity cannot confuse or
ing personal genomes, because the established distort interpretation. Although graphical represen-
genome browsers were largely developed to serve tations of quantitative data are commonplace in
the needs of the various genome science and daily life, such as stock market or sports-score
research communities. For instance, a personal graphs in newspapers, it is non-trivial to design
genome community, Genomes Unzipped, has graphical representations that facilitate accurate
developed its own genome browser to facilitate and effective assessment of quantitative informa-
personal genome exploration (Figure 4.9). Because tion. One could argue that this was especially true
personal genomics has applications and an audi- for personal genomic information, which is not only
ence beyond scientific research, it is likely that vast and complex, but also so personal with poten-
new types of genome browsers will need to be tially hazardous consequences from misinterpreta-
developed to serve the various aspects of per- tion. For example, an inaccurate quantitative
sonal genome exploration, ranging from clinical representation of risk for a particular disease could
interpretation to genomic genealogy. Fortunately, cause an individual to worry needlessly about their
there are already software tools available that personal health. Although it does not address
help to facilitate the development of customized genomics or biology specifically, we suggest readers
genome browsers. These tools provide the neces- interested in developing these types of visualiza-
sary toolkit and software “scaffolding” by which tions become familiar with the text, “The Visual Dis-
a custom genome browser can be created. How- play of Quantitative Information” by Edward Tufte.
ever, it should be noted that the currently availa- This book serves as a fundamental treatise for
ble genome browser toolkits require substantial understanding the theory and practice of designing
prerequisite technical knowledge for their use. accurate and effective visual representations of
There does not yet exist a “point-and-click” quantitative information.
genome browser toolkit that would allow non-
technical individuals to create custom genome
4.5.1 Nomograms for disease risk assessment
browsers with the same ease that one interacts
with modern word-processing software. How- One frequent use of personal genomic informa-
ever, it is likely that such software will emerge in tion is to assess an individual genetic risk for vari-
the future—at least in some form—as personal ous disease conditions. However, in order to
genomics continues to gain interest among assess genetic risk for even a single disease, a sub-
broader demographics. stantial amount of information must be taken into
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
32 rs985694
31 rs1884051
30 rs7903146 elements of the disease model. Therefore, they do
29
28 rs3753242 not enable the individual to assess the specific quan-
27 rs3020317
26 titative effects of inclusion or exclusion of a particu-
25
24 lar variant. Of course, this is reasonable, because
23
22 one aspect of the perceived value of DTC genomics
21 rs2283228 rs7901695
20 rs9295475 services is that they obviate the need for the cus-
19
18 rs6769511 tomer to perform any complex decision making
17 rs9939609
16 rs9460546 regarding the quantitative assessment of their
15
14 rs2792248 genomic information. However, these visualiza-
13 rs4376068
12 rs9465871 tions essentially preclude individuals from easily
11
10 drawing quantitative assessments that would devi-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
SNP Index
ate from the model imposed by the company (e.g.
Figure 4.10 Graphical representation of multilocus genetic risk for
removing a variant discovered in a population with
Type 2 diabetes estimated from a personal genome. Each point a different ethnic background than the individual).
represents a SNP locus in the genetic risk model, which are plotted to One novel approach towards a graphical repre-
show the running posterior probability of developing disease based sentation that enables precise visual quantitative
on the inclusion of individual risk variants. assessment of personal genetic disease risk is based
on a classical graphical computing paradigm called
consideration. This includes the incidence of the nomograms. Nomograms were invented by the
disease and the set of variants known to be associ- French mathematician Philbert Maurice d’Ocagne
ated with risk of developing the disease, along in the late 1800s as a means to facilitate visual com-
with their relevant statistical properties (e.g. the putation of complex mathematical equations.
odds ratio and statistical significance of the effect). Nomograms became popular in medicine as a
Information from multiple variants must be com- graphical tool for visually computing probability
bined with the disease incidence data to quantify equations pertaining to diagnosis. For example, a
an individual’s overall genetic risk for develop- nomogram might be designed such that it would
ing a disease (Figure 4.10). The process and rele- allow a physician to easily compute the post-test
vant methods for combining this information will probability of a particular disease diagnosis given
be discussed in more detail in Chapter 6, but here the results of a particular blood test, which has
we must consider that each variant contributes some known probability characteristics (e.g. likeli-
unique quantitative information to the final dis- hood ratio) associated with it (Figure 4.11). The
ease risk score, and variants rarely contribute visual computation would be performed by mark-
equally to the final composite risk score. Because ing a straight line from the pre-test probability
there are yet no commonly established guidelines value through the quantitative result of the test to
regarding the inclusion of variants in risk assess- arrive at an estimated post-test probability on the
ment models, it is good practice to reveal the spe- right-most vertical axis.
cific variants that comprise a particular genetic To facilitate visual quantitative assessment of indi-
disease risk model, and also to represent their vidual genetic risk based on a personal genome in a
individual contribution to the composite risk clinical setting, Ashley et al. described a novel repre-
score. sentation genetic risk nomogram that enables the
Most DTC genomics companies have internal visual assessment of multi-allelic genetic risk for a
policies governing the inclusion of variants into dis- disease using variable inclusion criteria for risk vari-
ease risk profiles, and do provide visual representa- ants (Figure 4.12). In this representation, the disease
tions showing both the overall risk for a disease and prevalence, which represents the pre-test probability
the individual contributions of individual variants of disease risk, is represented by a blue dot at the top
to the disease risk model. However, these represen- of the graph. Then, the entire set of SNPs reported to
tations are not designed to facilitate decision mak- be associated with risk of the disease are represented
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 69
0.1 99 for each SNP: the likelihood ratio of disease risk con-
ferred by the patient’s genotype at the locus, the
0.2 number of published studies reporting the SNP’s
association with the disease trait, the number of
0.5 95 samples (i.e. individuals) measured in those studies.
The graphical marks in the center of the nomogram
1 1,000 90
500 relate to these quantities, where the SNP is repre-
2 200 sented by a box, the size of the box is proportional to
80
100 the number samples used to determine the SNP
50 70 association, and the shade of the box represents the
5
20 60 number of distinct studies reporting the association
10 10 50 (more studies results in darker shades). As is appar-
5 40 ent in Figure 4.12, the set of associated SNPs are
20 2 30 ordered in descending fashion according to the sam-
1
30 0.5 20 ple size and number of studies, putting SNPs with
40 0.2 the greatest statistical confidence near the top of the
50 0.1 10 order. The boxes are arranged on a horizontal axis
60 0.05 according to the running post-test probability of dis-
0.02 5 ease estimated from the chain of alleles leading up to
70
0.01 that SNP in descending order. This allows the
80 0.005 2 observer to visualize the effect of each individual
0.002 SNP on the overall risk model. To determine the
90 0.001 1 individual’s multi-allelic genetic risk for the disease,
95 0.5 an observer would start from the pre-test probability
at the top and scan down the list of variants and stop
at the last SNP that meets some determination of
0.2
inclusion criteria, such as a minimum number of
99 0.1 samples used to determine the association. Then, the
Pre-test Likelihood Post-test observer would look at the value of the post-test
probability ratio probability probability in the right-most column to determine
the individual’s genetic risk of disease based on the
Figure 4.11 Schematic representation of a nomogram. Nomograms set of SNPs included up until that point.
are visual computing devices that have traditionally been used in medical
The genetic disease risk nomogram introduced
practice to visually estimate the posterior-probability of a diagnosis given
the results of a medical test. The left-most axis represents the pre-test by Ashley et al. has several advantages over the rep-
odds of a diagnosis, the middle axis represents the likelihood ratios of resentations employed before it. First, it generally
test outcomes, and the right-most axis represents the post-test follows good quantitative graphical design princi-
probability estimated as a product of the pre-test probability and the ples, such as showing the underlying data in a
given likelihood ratio of the test outcome. The idea behind a nomogram
complete and coherent manner, having a good
like the one shown is to begin at the pre-test probability axis, extend
a straight line from the appropriate pre-test probability value to intercept data-to-ink ratio, and having a clear purpose for the
the middle axis at the value of a test result, and terminate the straight design and orientation of graphical marks. How-
line on the post-test probability axis to determine the value. Image ever, another notable aspect of this visualization is
reprinted with permission from Morgan et al. 2010. that integrated a novel form of clinical data (genomic
risk markers) with a graphical paradigm that was
sequentially below the dot. For each SNP, relevant already established and familiar with those in the
annotative data is shown in the left columns: any clinical domain (nomograms). By leveraging a
genes associated with the SNP, the SNP identifier graphical paradigm that is already established in
(rsID), and the patient’s genotype at the SNP locus. the domain, the cognitive burden or hindrance to
The right columns contain quantitative information understanding is substantially reduced.
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
D Alzheimer’s disease
Gene* SNP location Patient LR Studies† Samples‡ Post-test
genotype probability
(%)
9·0%
TOMM40 rs157581 CT 1·6 6 7740 13·90%
DAPK1 rs4878104 TT 0·7 5 10 397 10·19%
TRAK2 rs13022344 CT 1·0 4 6512 10·12%
DAPK1 rs4877365 AA 0·6 4 4841 5·89%
E8F3 rs11016976 TT 1·0 3 5736 5·87%
TNK1 rs1554948 AA 0·9 3 5736 5·32%
MYH13 rs2074877 CT 1·0 3 5366 5·55%
GALP rs3745833 CC 0·9 3 5366 4·82%
PCK1 rs8192708 AA 0·9 3 5366 4·47%
rs1859849 TT 0·9 3 5304 4·02%
rs11622883 AT 1·0 3 5248 3·97%
WWC1 rs17070145 CC 0·9 3 2545 3·65%
LMNA rs505058 TT 1·0 2 4646 3·49%
ACAN rs2882676 CC 0·9 2 4590 3·22%
PGBD1 rs3800324 GG 0·6 2 4590 2·11%
GOLM1 rs10868366 GG 1·1 2 2156 2·30%
GOLM1 rs7019241 CC 1·1 2 2156 2·49%
rs9886784 CC 0·9 2 2156 2·36%
rs10519262 GG 0·9 2 2156 2·22%
rs463946 CG 0·5 2 1922 1·04%
PLAU rs2227564 CT 0·9 2 956 0·98%
ADAM12 rs1278279 GG 1·2 1 2320 1·23%
SORL1 rs2070045 GT 1·1 1 2031 1·36%
ABCA1 rs2230806 CT 1·1 1 1691 1·50%
PSEN1 rs165932 GT 0·9 1 170 1·37%
0·1 1 10 100
Risk (%)
C Prostate cancer
Gene* SNP location Patient LR Studies† Samples‡ Post-test
genotype probability
(%)
16%
rs1447295 CC 0·9 19 56 485 15%
TNRC6B rs9623117 TT 0·9 8 35 869 14%
DAB2IP rs1571801 GT 1·2 6 13997 16%
rs6983267 GT 1·0 3 3985 16%
CDH1 rs16260 CC 0·8 3 2238 13%
rs6983561 AA 1·0 2 1846 12%
rs1551512 TT 0·9 2 1846 12%
MMP2 rs1477017 AG 1·2 1 2878 13%
HIF1A rs11549465 CC 1·0 1 2878 14%
MMP2 rs11639960 AG 1·2 1 2878 16%
RSR2 rs2987983 AG 1·1 1 2216 18%
TLR10 rs4129009 TT 0·9 1 2163 17%
TLR10 rs4274855 CC 0·9 1 2163 16%
TLR1 rs5743604 AA 0·9 1 2163 15%
rs7837688 GT 1·7 1 2139 23%
rs4242382 GG 0·9 1 2139 21%
rs10086908 TT 1·0 1 2139 22%
rs7000448 TT 1·1 1 1012 23%
10 100
Risk (%)
Figure 4.12 Genomic nomograms for genetic disease risk assessment. The genomic nomogram was introduced by Ashley et al. as a visual
representation to facilitate clinical assessment and interpretation of genetic disease risk in a personal genome. Disease associated SNPs are shown
in decreasing order of sample size and number of studies showing association. Darker shaded boxes indicate SNPs having the most studies
reporting association with disease. The size of boxes are scaled proportional to the logarithm of the number of samples used to calculate the
likelihood ratio (LR). The ordering places SNP reported in the most and largest studies at the top of the graph, which have the most confidence for
their association with disease. Using this visualization, a clinician could scan down the list from the most confident associations towards the least
to choose a threshold based on personal criteria for SNP inclusion. For example, the clinician may only have confidence in SNPs that have been
reported to be associated with the disease in two or more studies. In this case, they would stop scanning the list of SNPs at the last SNP meeting
the minimum study criteria and look at the post-test probability to the right to determine the individual’s post-test probability of disease given the
pre-test probability and the likelihood ratios of the SNPs included into the model up to the chosen point. Image reprinted from Ashley, E. A. et al.
Clinical assessment incorporating a personal genome. Lancet 375, 1525–35 (2010) with permission from Elsevier.
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 71
rs7592 634
rs7081 035
rs1901 125
rs5756 661
rs1 643 92
79
rs3 326 46
rs8617 95
rs1 2 40
rs1 86 8
47 98
67
05
6
8
1
6
6
7
1
rs1903
9
1
1
rs7
90 90 90
80 80 80
70 70 70
60 60 60
50 50 50
40 > 30 40 40
30 25–30 30 30
18–25
20 20 20
< 18
10 10 10
0 0 0
Pre-test Body mass Genetics Post-test
probability index (BMI) probability
Figure 4.13 Integrating clinical and genetic information for disease risk assessment. Likelihood ratios can be chained together to form a
composite measure of disease risk. Traditional nomograms (left, as in Figure 4.11) could be integrated with genomic nomograms (right, as in
Figure 4.12) to illustrate the effect of both clinical factors (such as height, weight, etc.) and genetic factors on a personal disease risk assessment.
Smoking
Stress Pesticides
Alcohol
Anticoagulants Interferon-α
Abdominal Antihypertensives
Statins aortic aneurysm
Myocardial
infarction
Hypertension Depression
Coronary
artery disease
Type 2
Diabetes
Exercise
Diet
Obesity Allergens
Antipsychotics
Prostate
cancer Injury
Osteoarthritis
Figure 4.14 Integrative visualization for gene-environment interactions combing personal genomic disease risk estimates with modifiable
environmental factors. The integrative gene-environment visualization shown in this figure was introduced by Ashley et al. as a prototype for a
gene-environment “report card” which could succinctly and intuitively summarize the complete disease risk profile for an individual patient based
on their personal genomic information. The disease labels are scaled proportional to the individual’s estimated overall risk for each disease, and
the directed edges connecting the diseases indicate that one disease predisposes to another in the direction of the arrow. The edge of the circle is
annotated with circles that represent modifiable environmental factors that are relevant to the diseases. These environmental factors are connected
to the diseases whose risk they are known to modulate by dashed arrows. The size of the circle marking the environmental factor is scaled
according to the number of diseases it affects, and the color of the circle represents the maximum overall risk for diseases which it is connected to,
where darker colors represent high risk. The innovation behind this visualization is that disease–disease interactions are represented, and that a
visual scan of the periphery of the visualization facilitates easy identification of environmental risk factors that are likely to have the largest affect
on modifying health risks. Image reprinted from Ashley, E. A. et al. Clinical assessment incorporating a personal genome. Lancet 375, 1525–35
(2010) with permission from Elsevier.
ple individuals, we may want to explore how to play a role in the etiology of the disease, which
similar they might be on a genomic level. The were connected by lines to indicate the relation-
exact methodology for performing this compari- ship (Figure 4.14). In this representation, diseases
son is discussed in Chapter 5; however, from a that are known to predispose to another were
visualization perspective we could use a rela- connected by lines to indicate their relationship.
tional technique known as a cladogram, or more These inter-disease relationships are as important
simply a “tree”. Ashley et al. described a novel as the individual disease risk, because even
relational representation that summarized and though an individual might have low risk for one
represented an individual’s personal disease risk disease, they may have increased risk for diseases
along with environmental factors that are known that predispose to it.
OUP CORRECTED PROOF – FINAL, 11/29/2012, SPi
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 73
Adipocytokine ERK
signalling pathway
IKK
FFA JNK
Obesity
TNF a mTOR
PKC z
PKC d
Insulin resistance
Transient hyperglycemia
Hyperinsulinism
Pancreatic b-cell Impaired insulin secretion
Glucose Ca2+-dependent
Apoptosis
(Hyperglycemia) Maturity onset diabetes
JNK of the young
PDX-1
INS
ROS MafA DNA
VDCC
Prevention of Ca2+
membrane depolarization
SUR1 ATP
Kir6.2
Mitochondrion
K+ Mitochondrial dysfunction
GLUT2 GK PYK
Glucose Pyruvate
Figure 4.15 Functional map of metabolic signaling pathways annotated with personal genomic information. In the case of rare variants, we do
not typically have population-level statistical data to draw from to draw inferences about the potential consequences of the genetic variants. One
approach for a biological investigation of such rare variants is to integrate personal genomic information with functional biological maps such as
known biological pathways. This figure shows a schematic representation of a metabolic signaling pathway associated with insulin signaling and
Type II diabetes mellitus as it is represented in the Kyoto Encyclopedia of Genes and Genomes (KEGG) biological pathway database. The stars
indicate genes for which an individual was found to have potentially damaging non-synonymous mutations (i.e. potentially deleterious change to
the protein coding sequence) based on an analysis of their personal genomic information. Such information could be used as the basis to
formulate or investigate various biological hypotheses concerning individual physiology.
V I S UA L I Z I N G P E R S O N A L G E N O M I C S 75