Bi 0105

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Using the NCBI Map Viewer to Browse UNIT 1.

5
Genomic Sequence Data
The NCBI Map Viewer is an interface to a large, integrated set of genomic data, in-
cluding sequence, cytogenetic, genetic linkage, and radiation hybrid maps, as well as
the assembled and annotated genomic sequence itself. Along with the UCSC Genome
Browser (UNIT 1.4) and Ensembl (UNIT 1.15), it is one of the primary Web sites from which
genome sequence data can be accessed.

This unit includes an introduction to the Map Viewer (see Basic Protocol), which de-
scribes how to perform a simple text-based search of genome annotations to view the
genomic context of a gene, navigate along a chromosome, zoom in and out, and change
the displayed maps to hide and show information. It also describes some of NCBI’s
sequence-analysis tools, which are provided as links from the Map Viewer. The Alter-
nate Protocols describe different ways to query the genome sequence, and also illustrate
additional features of the Map Viewer. Alternate Protocol 1 shows how to perform
and interpret the results of a BLAST search against the human genome. Alternate
Protocol 2 demonstrates how to retrieve a list of all genes between two STS mark-
ers. Finally, Alternate Protocol 3 shows how to find all annotated members of a
gene family.

At the time of this writing, NCBI provides Map Viewers for eleven vertebrates, six inver-
tebrates, three protozoa, nine plants, and fourteen fungi. Although the data themselves
are different for each organism, the basic navigation principles are the same. The Basic
Protocol and Alternate Protocols 1 and 2 are illustrated with examples from the human
genome, while Alternate Protocol 3 uses the mouse genome.

GENERAL NAVIGATION IN THE NCBI MAP VIEWER BASIC


PROTOCOL
This protocol introduces the basic concepts of the Map Viewer interface, including how
to perform a text-based search of genome annotations to view the genomic context of a
gene, navigate along a chromosome, zoom in and out, and change the displayed maps
to hide and show information. It also describes some of NCBI’s sequence-analysis tools
that are provided as links from the Map Viewer. The figures shown in this protocol
illustrate examples using the Map Viewer for the human genome, but Map Viewers are
also provided for other organisms (see above).

Necessary Resources
Hardware
Computer with Internet access

Software
An up-to-date Internet browser, such as Internet Explorer
(http://www.microsoft.com/ie); Netscape (http://browser.netscape.com);
Firefox (http://www.mozilla.org/firefox); or Safari (http://www.apple.com/
safari)
1. Start at the NCBI home page, http://www.ncbi.nlm.nih.gov/.

Using Biological
Databases
Contributed by Tyra G. Wolfsberg 1.5.1
Current Protocols in Bioinformatics (2006) 1.5.1-1.5.22
Copyright 
C 2006 by John Wiley & Sons, Inc.
Supplement 16
2. Jump to the NCBI Map Viewer using the link to Map Viewer on the right (Hot Spots)
sidebar.
The resulting page lists the available Map Viewers, organized by organism type. For
most organisms, only a single set of maps is available. However, NCBI makes two maps
available for both human and mouse: the most recently updated version (build 36), as
well as an older version (build 35). Click on an organism name to retrieve the main Map
Viewer page for that organism, or click on the orange BLAST link to begin a BLAST
search on sequences from that organism.
3. To perform a text-based search directly from this main page, select the organism
name from the pull-down menu at the top of the page and enter a query in the text
box.
The figures in this protocol illustrate a query for the acetylcholinesterase gene,
or ACHE, on the most recent version of the human genome assembly. Thus, type
acetylcholinesterase into the text box at the top of the page, select “Homo
sapiens (human) Build 36” from the pull-down menu, then hit the Go! button. The Map
Viewer supports many types of queries in addition to gene name, including gene product,
accession number, protein domain, and marker name. Additional queries are described
in the Alternate Protocols.
4. The top of the search results page that then appears shows a schematic of the human
chromosomes, with the position(s) of the query marked in red. The middle blue bar
shows the total number of hits to the query, and the table at the bottom of the page
shows the details.
Sample search results are shown in Figure 1.5.1. NCBI displays up to five different
human genome assemblies, described at http://www.ncbi.nlm.nih.gov/mapview/static/

Using the NCBI


Map Viewer to
Browse Genomic
Sequence Data Figure 1.5.1 The main entry point for the NCBI Map Viewer, showing the results of a query for
acetylcholinesterase on build 36 of the human genome.
1.5.2
Supplement 16 Current Protocols in Bioinformatics
humansearch.html#assembly. Reference is assembled from High Throughput
Genomic sequences by NCBI’s build process (http://www.ncbi.nlm.nih.gov/genome/
guide/build.html). Celera is assembled from whole-genome shotgun sequences. DR52
and DR53 are reference sequences for the DR52 and DR53 haplotypes in the Major
Histocompatibility Locus and are specific to chromosome 6; HSC TCAG is an assembly
of chromosome 7 from The Center for Applied Genomics.
The query term acetylcholinesterase yields seven hits in the human genome. The hits on
chromosomes 3 and 14 are genes that contain the word “acetylcholinesterase” in their
name, and the hit on chromosome 7 is to the acetylcholinesterase gene itself.

5. To view the genomic context of a hit, including all maps onto which it has been
placed, click on the name of the hit in the Map Element column in the table under the
schematic of chromosomes. Click on the name of an individual map to launch a view
of the query term placed on that map alone. To see all hits on a single chromosome,
click on the chromosome number in the top graphic, or on the “all matches” link
adjacent to the chromosome number in the results table.
To generate the view shown in Figure 1.5.2, click on the term ACHE in the Map Element
column for the Reference assembly.

6. Look at the default Map Viewer display for the map element selected in the previous
step.
Figure 1.5.2 shows the genomic context of the ACHE gene. Thirty genes surrounding
ACHE are shown in this view. The sequence coordinates on chromosome 7 (98,240 K-
101,230 K bp) are indicated at the top of the page in the Region Displayed, as well as
in the two text fields on the left blue sidebar. In the ideogram in the blue sidebar, the
region displayed is indicated in red, relative to the known cytogenetic banding patterns
on chromosome 7.

Using Biological
Databases
Figure 1.5.2 The default Map View for the query acetylcholinesterase.
1.5.3
Current Protocols in Bioinformatics Supplement 16
Details are shown in the three default maps in the center of the window. The position of
the query, acetylcholinesterase, is shown in red and/or pink on each map. The Genes cyto
(Genes Cytogenetic) map shows the cytogenetic locations of genes as reported in Entrez
Gene. Hs UniG, or UniGene Human, shows human mRNA and EST sequences aligned
to the genome, and named with their UniGene cluster identifier. The gray histogram
shows the density of aligned ESTs and mRNAs in this region, and the blue lines show
putative intron/exon boundaries, with exons as thick blue lines. The map on the right
side of the display is known as the master map, and is always shown in verbose mode,
with more details than the maps to the left. In this case, the master map is the Genes seq
(Genes sequence) map, which shows annotated gene models. If a gene is alternatively
spliced, all possible exons will be indicated on the Genes seq map (in comparison, the
RefSeq Transcripts map shows only exon combinations that exist in individual mRNA
sequences). Exons are shown as thick bars and introns as thin lines. The arrow to the
right of the gene name indicates the direction of transcription. These maps, as well as the
others hidden in this view, are described in more detail in the section of the online help
documentation at http://www.ncbi.nlm.nih.gov/mapview/static/humansearch.html.
7. Explore the Genes sequence map in more detail. The Genes sequence map shows
both known and putative genes that NCBI has annotated on the genomic contigs.
This map shows all possible exons for genes, including alternative splices; to see
individual transcripts, use the RefSeq Transcripts map, as detailed in Alternate
Protocol 1. Exons are depicted as boxes, and introns as the lines between them.
Coding exons are shaded boxes; untranslated exons are unfilled boxes. A black
arrow to the right of the gene symbol indicates the direction of transcription of the
gene. Furthermore, genes transcribed from the bottom of the display up are displayed
to the left of the gray line and genes transcribed from the top down are displayed
to the right. NCBI creates the following four types of gene models for the human
genome assembly. The model type is indicated in the column labeled E when the
Genes sequence map is the master map.

a. best RefSeq: The model is supported by the best alignment of an mRNA Reference
Sequence (Pruitt et al., 2005) to the genome sequence.
b. mRNA: The model is supported by the alignment of other transcripts to the genome
sequence.
c. protein: The model is supported by the alignment of a protein sequence to the
genome sequence.
d. external: The model is provided by an outside source and NCBI does not indicate
what evidence was used to predict it.

When the Genes sequence map is the master map and is on the right side of the Map
Viewer display, up to ten additional annotations are available for each gene.

a. OMIM, Online Mendelian Inheritance in Man is a continuously updated catalog


of human genes and genetic disorders (UNIT 1.2).
b. HGNC, the HUGO Gene Nomenclature Committee, provides a unique name for
each human gene.
c. sv, sequence viewer, provides a graphical representation of the gene, including
annotated features like coding region (CDS), RNA, and gene. The sequence is
displayed as a graphic and cannot be copied and pasted from this view.
d. pr links to a list of protein sequences encoded by the gene.
e. dl, Sequence Download, allows the user to retrieve the genomic sequence or
annotation of the gene in text format. To retrieve a different region, change the
Using the NCBI
Map Viewer to coordinates listed in the text box. The region can be returned either as sequence in
Browse Genomic FASTA format or in GenBank format (APPENDIX 1B). The GenBank format shows
Sequence Data all the features that have been annotated on the selected region, including mRNA
1.5.4
Supplement 16 Current Protocols in Bioinformatics
Figure 1.5.3 The Evidence Viewer for exon 3 of acetylcholinesterase. This window shows the
alignment of the genomic contig NT 007933.14 with four RefSeq mRNAs (NM ######) and nine
GenBank mRNAs.

and CDS. Note that, at the top of the page, the location of the region is shown in
chromosomal coordinates. Further down, the location is shown on the coordinates
of the NT ###### contig that spans this region.
f. ev, evidence viewer, displays the biological evidence supporting a particular gene
model. This view shows all GenBank and RefSeq mRNAs, ESTs, and model
exons in graphical fashion, and also displays alignments of the mRNAs with the
genomic sequence. Figure 1.5.3 shows the evidence viewer for a portion of exon
3 of ACHE, with mismatches between the genomic and mRNA sequences shown
in red. For additional information, click on Evidence Viewer Help on any ev
page. Sequences in the alignment can be copied and pasted into other computer
applications.
g. mm links to the Model Maker, which shows the exons that result when GenBank
mRNAs and gene predictions are aligned to the genomic sequence. ESTs can be
added as well. The user can then select individual exons to create a custom model
of the gene.
The Model Maker for ACHE (Fig. 1.5.4) shows 15 transcripts that align to the genome
in this region. Together, these 15 sequences contribute 14 potential exons to the ACHE
gene model, numbered and shown in green. To create a model, click on individual
exons in the Putative Exons section. A graphic will appear first, followed by a DNA
sequence in the top box, and a three-frame translation of this sequence in the bottom
three boxes. For example, the novel transcript produced by splicing together exons
1-7-8-12-13 is shown in Figure 1.5.4. Additional features of the Model Maker are Using Biological
described in the Help Document, available by clicking help on any mm page. Databases

1.5.5
Current Protocols in Bioinformatics Supplement 16
Figure 1.5.4 The Model Maker for acetylcholinesterase, showing the exons contributed by 12
mRNAs (light blue) and three Gnomon gene predictions (hmm######). Together, these 15 se-
quences contribute 14 potential exons, numbered and shown in green. For the color version of
this figure go to http://www.currentprotocols.com.

h. hm, HomoloGene, is a system for automated detection of homologs among the


annotated genes of several completely sequenced eukaryotic genomes.
i. sts links to markers in NCBI’s UniSTS.
j. CCDS, the Consensus CDS project, is a collaborative effort involving NCBI that
aims to identify a set of consistently annotated, high-quality human protein coding
regions. Gene models with a CCDS link should be considered a “gold standard,”
as these models were identified by multiple gene prediction methods.
8. The NCBI Map Viewer provides an integrated set of sequence, cytogenetic, genetic
linkage, radiation hybrid, and YAC contig maps. Use the Maps & Options link in
the blue sidebar on the left of the page (or near the top of the page on the right side)
to hide and show maps in the Map Viewer. It is also possible to change the position
shown in the Map Viewer by entering the new coordinates in the Maps & Options
window. All of the human sequence maps are based on the same coordinate system,
that of the assembled genome sequence. Connections are drawn between other maps
when the same object has been placed on two maps, for example, an STS or a gene
name. The elements will be connected even if they are known by different names on
different maps.
A view of the Maps & Options window is shown in Figure 1.5.5. To generate this view, open
Using the NCBI the Maps & Options window by clicking either of the abovementioned links, highlight
Map Viewer to
Browse Genomic Gene Cyto and ugHs, then click REMOVE to remove them from the list. Next, add the
Sequence Data maps called Phenotype and Variation by highlighting their names and clicking ADD.

1.5.6
Supplement 16 Current Protocols in Bioinformatics
Figure 1.5.5 The Maps & Options window, which allows the user to alter the default Map Viewer
settings.

Make sure that the Genes sequence map remains as the master by selecting its name,
Gene, then clicking Make Master/Move to Bottom. Finally, click OK to update the Map
Viewer with these new selections. The resulting page is shown in Figure 1.5.6.
The Phenotype, or Pheno map (Fig. 1.5.6) displays the placement of loci associated
with phenotypes on the human genome assembly. Phenotypes include those described in
Online Mendelian Inheritance in Man (OMIM) as well as quantitative trait loci (QTLs).
Phenotype names are linked to descriptions in OMIM. The Variation map shows the
position of genetic variation data from NCBI’s dbSNP. To conserve space on the display,
the positions of SNPs are shown as a histogram when many SNPs are present. In zoomed-in
regions with few SNPs, SNP names are linked to dbSNP.

9. NCBI provides two ways to adjust focus of the display. Use the zoom control bar
in the blue sidebar to change the view to the full chromosome (widest bar at the
top), 1/10,000 of the chromosome (narrowest bar at the bottom), or some interval
in between. Alternatively, place the mouse cursor over one of the maps in the Map
Viewer so that it assumes the shape of a hand, click once, and select one of the
options from the pop-up menu:
Recenter [the display around that point]
Zoom in x2
Zoom in x4
Zoom in x8
Zoom out x2
Show [a defined sequence interval of] 10 M, 1 M, 100 K, or 10 K
Show Sequence [of the genome in this region].
Continuing the above example, place the mouse cursor over the Genes sequence map
shown in Figure 1.5.6, click once on the map location (not name) of ACHE to open the
pop-up menu, and select” Zoom in x8”. The map location of ACHE is slightly below its
name, and the two are connected by a faint gray line, The resulting window will display
Using Biological
a closer view of ACHE. To zoom in even more, click on the map location of ACHE again Databases
and select “Zoom in x8.” Repeat the zoom a third time for an even closer view of ACHE.
1.5.7
Current Protocols in Bioinformatics Supplement 16
Figure 1.5.6 A customized view of the genomic context of the acetylcholinesterase gene, using
the maps selected in the Maps & Options shown in Figure 1.5.5.

10. Additional navigation is also available in the Map Viewer window. Clicking the blue
arrow next to a map name will make that map the master and move it to the rightmost
position. The blue “X” next to a map name is used to remove that map from the view.
Small arrows on the top and bottom of each map scroll the display up and down.
Alternatively, if one knows the exact position along the chromosome that one wishes
to view, one may enter its coordinates in Region Shown in the blue sidebar, then hit
Go.
Click on the arrow next to the Variation map to make it the master.

11. The map on the right side of the display, the master map, has an enhanced display
compared to the other maps.
Figures 1.5.2 to 1.5.4 demonstrate some of the additional features available when
Genes sequence is the master map. Figure 1.5.7 shows an example of the SNP prop-
erties that are visible when Variation is the master map. The graphic in the Map column
indicates the status of the mapping of the SNP to the genome assembly. The Gene column
indicates if the SNP is part of a gene. Heterozygosity indicates average heterozygosity of
the SNP. The Validation column distinguishes validated from unvalidated SNPs. The final
column presents miscellaneous information. A detailed figure legend is available from
http://www.ncbi.nlm.nih.gov/mapview/static/humansearch.html#snp.
In this example, there are ten SNPs for which the L (locus), T (transcript), and C (coding
sequence) in the Gene column are colored (rs17886728, rs1799806, rs7636, rs3028261,
Using the NCBI rs8286, rs17885778, rs1056867, rs17234982, rs13246682, and rs17881553). These ten
Map Viewer to SNPs occur in the coding sequence of the ACHE gene. More information on each SNP, in-
Browse Genomic cluding the sequence and whether the SNP is synonymous or nonsynonymous, is available
Sequence Data

1.5.8
Supplement 16 Current Protocols in Bioinformatics
Figure 1.5.7 A zoomed-in view of the display in Figure 1.5.6, showing the region around acetyl-
cholinesterase in detail.

by clicking on the SNP identifier and linking to NCBI’s dbSNP. According to the Summary
of Maps at the bottom of the page, only 28 of the 52 SNPs in this region are labeled (see a
similar region in Fig. 1.5.12). Some of these SNPs fall into the bins called “4 variations”;
others are indicated in the graphic as unlabeled tick marks. Some of these unlabeled SNPs
could also fall in the coding sequence of ACHE.

BLAST SEARCH AGAINST THE GENOME ALTERNATE


PROTOCOL 1
There are many ways to access the NCBI’s Map Viewer in addition to a basic query
by gene name. One of these is to perform a BLAST search (see UNITS 3.3 & 3.4), using
either a protein or nucleotide sequence as a query. This example is illustrated using the
Map Viewer for the human genome. The example also explains how to view individual
transcripts that have been aligned to the genome.

Necessary Resources
Hardware
Computer with Internet access

Software
An up-to-date Internet browser, such as Internet Explorer
(http://www.microsoft.com/ie); Netscape (http://browser.netscape.com); Firefox
(http://www.mozilla.org/firefox); or Safari (http://www.apple.com/safari)
Using Biological
Databases

1.5.9
Current Protocols in Bioinformatics Supplement 16
1. Start at the NCBI’s BLAST page, http://www.ncbi.nlm.nih.gov/BLAST/. The section
of the page labeled Genomes displays links to organism-specific genomic BLAST
databases. Alternatively, these genomic databases can also be accessed from the Map
Viewer for certain organisms. Select the page for the appropriate organism.
For this example, select Human. A number of different nucleotide and protein databases
related to the human genome sequencing project are available from this page. Descriptions
of these databases may be viewed by clicking the Database link (i.e., the word Database
in blue near the top of the page). The assembled genome itself is in the database called
“genome.”

2. Enter either a sequence accession or “gi” number, or a FASTA-formatted sequence


(see APPENDIX 1B) for the query into the box on the BLAST form, choose the correct
BLAST program, change any default parameters, and click Begin Search.
For a complete discussion of the BLAST program and parameters, see UNITS 3.3 & 3.4.
A gi number is a unique integer assigned to a sequence that changes if the sequence is
updated.
The examples shown in Figures 1.5.8, 1.5.9, and 1.5.10 are of a megaBLAST search against
the “genome (all assemblies)” database using the accession number for the ACHE mRNA
RefSeq, NM 000665, as a query. All other parameters were left in their default settings.
megaBLAST is a version of BLAST that is optimized for quickly comparing highly related
nucleotide sequences.

Figure 1.5.8 The results of a megaBLAST search against the human “genome” database using
Using the NCBI the accession number for the ACHE mRNA RefSeq, NM-000665.
Map Viewer to
Browse Genomic
Sequence Data

1.5.10
Supplement 16 Current Protocols in Bioinformatics
Figure 1.5.9 The Genome View of the results of the BLAST search shown in Figure 1.5.8, as
shown in the Map Viewer results page.

3. The first page returned from the BLAST server allows the user to change various
formatting options, as well as see the results. Click the Format! button to check
the results. When the search is complete, the resulting page is similar to a standard
BLAST report.
Near the top of the results page is a schematic showing the position of the hit on the
query sequence (Fig. 1.5.8). The top red line indicates the query sequence, and lines
under this depict the alignment of database sequences with the query, color-coded by
BLAST score. In this case, the query aligns with three database sequences throughout its
length, and with the same high score. The contig in the first alignment, NT 007933.14,
is from the reference genome assembly. The contig in the second alignment, is from the
Celera genome assembly. The contig in the third alignment is from the TCAG assembly of
chromosome 7. The bottom of the BLAST results page depicts the alignment of the query
with the database sequences.

4. Click on the Genome View button on the BLAST results page (Fig. 1.5.8) to link to
an overview of the genomic context of the hit(s).
The Genome View, shown in Figure 1.5.9, is very similar to the initial Map Viewer results
page (Fig. 1.5.1), with the BLAST hits shown as the queries. The schematic at the top
of the page shows the location of the BLAST hits on chromosome. The blue bar below
the BLAST scores legend provides the name of the query sequence, extracted from the
accession number provided in step 2, above. The next section shows the details of the
database hits from the BLAST search. Starting on the left, the Chr column provides the
chromosome number of the hit, while the Assembly refers to the genome assembly. The
Map element is the accession number of the contig. The number of Hits is the number Using Biological
of blocks in the sequence alignment. The five blocks in each of the hits to chromosome 7 Databases

1.5.11
Current Protocols in Bioinformatics Supplement 16
Figure 1.5.10 The genomic context of the results of the BLAST search shown in Figure 1.5.8.

represent alignments between the five exons of the ACHE mRNA and the genome. Each
of the chromosome 7 hits has the same Score and E-value, because the three sequence
assemblies are identical, or nearly identical, in this region. There are also two lower-
scoring hits to chromosome 16.

5. To launch the Map Viewer and view the genomic context of an individual hit, click
on an item in the Map Element column. The BLAST hits are integrated into the
Contig map, the master map in this view. The position of the individual BLAST hits
are depicted as bars and color-coded by score. Highlighted in pink are the position
of the hit in the query, as well as the percent identity between the query and the
genomic sequence.
To view the data shown in Figure 1.5.10, click on NT 007933 in the Map Element column
shown in Figure 1.5.9. Four of the five BLAST hits correspond to exons drawn on the
Genes sequence map, while the fifth hit is shorter than the corresponding Genes sequence
exon. The explanation for this discrepancy is that the Genes sequence map depicts a
flattened view of all exons for a particular gene model. The RefSeq RNA map depicts indi-
vidual transcripts, and indicates that there are two alternatively spliced forms of ACHE,
which differ in the length of the final exon (Figure 1.5.10). The BLAST hit corresponds to
the form with the shorter final exon. As the Genes sequence map depicts a composite view
of the exons, the final exon appears long. The map on the left, the Model map, depicts ab
initio gene models predicted by Gnomon.

6. This view can now be manipulated as described above (see Basic Protocol, steps 7
to 11).
Using the NCBI
Map Viewer to
Browse Genomic
Sequence Data

1.5.12
Supplement 16 Current Protocols in Bioinformatics
USING THE NCBI MAP VIEWER TO VIEW A REGION BETWEEN TWO ALTERNATE
MARKERS PROTOCOL 2
A third way to access the NCBI’s Map Viewer is to view a region between two markers.
Such a strategy would be useful in a positional cloning project, as one can quickly view
all the genes in a critical region defined by two markers. This example is illustrated using
the Map Viewer for the human genome.
Necessary Resources
Hardware
Computer with Internet access

Software
An up-to-date Internet browser, such as Internet Explorer
(http://www.microsoft.com/ie); Netscape (http://browser.netscape.com); Firefox
(http://www.mozilla.org/firefox); or Safari (http://www.apple.com/safari)
1. Start at the NCBI home page, http://www.ncbi.nlm.nih.gov/.
2. Jump to the Human Genome Resources page, one of the Hot Spots listed on the
right side of the home page. The Human Genome Resources page can also be
directly accessed at http://www.ncbi.nlm.nih.gov/genome/guide/human/. On the Hu-
man Genome Resources page, to query the Map Viewer, either select the Map Viewer
or choose Human Genome (Map Viewer) from the pull-down menu at the top of the
page.
3. Search for the region containing two markers by querying for both terms separated
by the word OR.
For the example in this protocol, the critical region is the <2 Mb region containing the two
STS markers RH46231 and RH71410. Thus, type RH46231 OR RH71410 in the query
box. As the search engine recognizes most marker names and their aliases, RH46231 can
also be found by its alias stSG22199.

4. The resulting page shows the position of the two hits on the human genome. Click on
the line for an individual hit to view the genomic context of that hit. If both hits are
on the same chromosome, view them simultaneously by clicking on the chromosome
number in the top graphic, or the “all matches” link in the bottom table.
For this example, click on “all matches” on the reference assembly of chromosome 7 in
the results table.

5. The Map Viewer returns a zoomed-out view showing the genomic region surrounding
the two markers. This is too broad a view if one wishes to analyze only those
genes that are between the two markers. To limit the view to the region precisely
between the two markers, type the marker names in the text boxes in the blue sidebar,
then hit “Go.”
To continue the example, enter RH46231 into the top text box in the blue sidebar on
the left-hand side of the screen and enter RH71410 into the bottom box, then click
“Go.” Figure 1.5.11 shows the resulting view of the region of chromosome 7 between the
STS markers RH46231 and RH71410. The STS map shows the placement of STSs from
various sources onto the genome using Electronic PCR (e-PCR; Schuler, 1998). e-PCR
is a computational method that predicts the location of sequence tagged sites in DNA by
searching for subsequences that closely match the PCR primers used to make the STS,
and which also have the correct order, orientation, and spacing such that they could prime
the amplification of a PCR product of the correct molecular weight. The two STS markers
used in the search are highlighted in pink, at the top and bottom of the interval. The STS
Using Biological
is the master map. In this case, the additional information to the right of the map is a Databases
table indicating which genetic and RH maps each STS has been placed on.
1.5.13
Current Protocols in Bioinformatics Supplement 16
Figure 1.5.11 The genomic context of the region between RH markers RH46231 and RH71410.

Figure 1.5.12 The Summary of Maps from the region displayed in Figure 1.5.11. The summary
highlights the fact that although there are 62 genes in this region, only 50 of them are labeled with
a gene symbol.

At the bottom of the display, below the graphic, is a section entitled Summary of Maps
(Fig. 1.5.12). For the Genes sequence map, for example, this summary reveals the coor-
dinates being displayed in the graphic and the total number of genes on the chromosome.
It also indicates the number of genes in the region shown in the graphic, 62, versus the
number of genes whose names are labeled in the graphic, 50 in this case. It is important
to remember that if space is limiting, the Map Viewer only displays a portion of the data
in a given region. In order to view the names of all the genes in the region, one could
zoom in using the zoom controls described in the Basic Protocol. Alternatively, one could
increase the page length of the graphic by typing a larger number into the Page Length
Using the NCBI box of the Maps & Options window (Fig. 1.5.5).
Map Viewer to
Browse Genomic
Sequence Data

1.5.14
Supplement 16 Current Protocols in Bioinformatics
Figure 1.5.13 A text view of the genes located between the two RH markers RH93969 and
RH71410. This view was generated from the page shown in Figure 1.5.11 by clicking on the link
to Data as Table View.

The other maps displayed in the default include the UniGene Human (Hs UniG) and
Genes sequence (Genes seq) described above, as well as the GeneMap99-GB4 (GM99-
GB4), a map of markers mapped onto the GB4 RH panel by the International Radiation
Hybrid Consortium. The red lines show the position of the STS markers on the maps.

6. View all the genes between the two markers by selecting Data As Table View from
the left blue sidebar.
The Data as Table View, shown in Figure 1.5.13, will provide a table of all the data in
the current window. The view may appear slowly in the Web browser, especially if the
genomic interval is large and contains much data. However, unlike the graphic, this view
will report all available data in the region. To improve the download speed, remove any
unneeded maps by clicking on the X next to the map name before selecting the Data as
Table View link. In the resulting table, the Genes on Sequence section shows the start
and stop position of all genes in this interval, the gene symbol, orientation of the gene
along the chromosome, links to some additional views of the gene such as sv (sequence
viewer) and ev (evidence viewer) discussed in the Basic Protocol, the evidence for the
gene, cytogenetic position, and full name. The data on the genes in this critical region
can be examined directly from the Web browser. For researchers who prefer to track their
data in spreadsheets, the table can be saved and imported into outside programs such as
Microsoft Excel.

7. Make changes to the display as necessary (see Basic Protocol, steps 6 to 11).

Using Biological
Databases

1.5.15
Current Protocols in Bioinformatics Supplement 16
ALTERNATE USING THE NCBI MAP VIEWER TO SEARCH FOR ALL MEMBERS OF A
PROTOCOL 3 GENE FAMILY IN THE MOUSE GENOME
This protocol demonstrates some advanced features of the Map Viewer text-based query.
The example is illustrated using the Map Viewer for the mouse genome.

Necessary Resources
Hardware
Computer with Internet access

Software
An up-to-date Internet browser, such as Internet Explorer
(http://www.microsoft.com/ie); Netscape (http://browser.netscape.com); Firefox
(http://www.mozilla.org/firefox); or Safari (http://www.apple.com/safari)
1. Start at the NCBI home page, http://www.ncbi.nlm.nih.gov/.
2. Jump to the NCBI Map Viewer using the link to Map Viewer on the right (Hot Spots)
sidebar.
3. Click on the appropriate organism name to get to the main search page for that
organism, and query for the gene family of interest by entering the terms in the Search
for box. If the Map Viewer displays more than one assembly for that organism, it
will be easier to understand the results if the query is limited to a single assembly
using the assembly pull-down menu. Next, click the Find button. In order to find
all members of a gene family, it is probably necessary to use the asterisk (*) as a
wildcard to represent any number of characters. To limit the search to gene symbols,
restrict the search with the term [sym].
To find all mouse members of the ADAM gene family on the most recent assembly of the
mouse reference genome, click on “Mus musculus (mouse) Build 36” and, on the page
that appears, change the “assembly” pull-down menu at the top of the page from “All”
to “reference.” One might start the search by typing the term ADAM in the “Search for”
text box at the top of the page. This search does not return the desired outcome: only
13 hits are returned, and some of these (for example, LOC667041) may not be members
of the ADAM family. A query for ADAM [sym] returns no matches, as there are no
genes with the symbol ADAM. Searching with the term ADAM* returns 84 hits, most of
which are correct; however, some unwanted hits such as LOC667041 are included. A
search for ADAM* [sym] is almost correct, but the list of 81 hits includes genes that are
members of both the ADAM, the ADAMTS, and the ADAMdec families. To eliminate the
ADAMTS and ADAMdec family members, add the term NOT ADAMTS* [sym] NOT
ADAMDEC* [sym] to the query. The results of the final query, ADAM* [sym] NOT
ADAMTS*[sym] NOT ADAMDEC* [sym], which generates the desired 51 hits, are
shown in Figure 1.5.14. Note that this text-based search returns only those genes whose
symbol begins with the term ADAM. Additional unnamed members of the ADAM gene
family may be found in a BLAST search (UNITS 3.3 & 3.4).

4. Examine the genomic context of one of the genes returned in the search by selecting
a term in the Map Element column. Alternatively, to view all hits on a single
chromosome, click on the chromosome number in the top graphic, or on the term
“all matches” next to the chromosome number in the bottom table.
For example, to see the hits to Adam2, Adam7, and Adam28 on chromosome 14, click
on “all matches” next to chromosome 14. The results are shown in Figure 1.5.15. The
four maps displayed are the four maps to which Adam2, Adam 7, or Adam28 have been
Using the NCBI mapped. The mouse maps are described in detail in the link called Mouse Maps Help
Map Viewer to near the top of the left sidebar.
Browse Genomic
Sequence Data

1.5.16
Supplement 16 Current Protocols in Bioinformatics
Figure 1.5.14 The results of querying NCBI’s mouse Map Viewer with ADAM* [sym] NOT
ADAMTS*[sym] NOT ADAMDEC* [sym]. This search results in all named members of the ADAM
gene family in mouse.

The Assembly map shows all the genome assemblies available in this region of the mouse
genome (see below). MGI shows a linkage map from Mouse Genome Informatics. STS
and Genes seq show the position of STSs and genes, respectively, as on the human maps.

5. The NCBI Map Viewer integrates data from the reference assembly (mouse strain
C57BL/6) as well as several alternate assemblies. The mixed-strain Celera assembly,
as well as MGSCv3 (strain C57BL/6), are nearly complete, while the assemblies from
the following mouse strains are partial: 129 substrain, A/J, B6/CBAF1J, Balb/c, C3H,
NOD, and unknown. The assemblies available in a particular genomic region are
listed in the Assembly map. The blue vertical line indicates the assembly being
viewed. By default, the reference assembly is shown in blue. The assembly can be
changed in the Maps & Options window (accessed as in the Basic Protocol).
In this region of the genome, the available assemblies are 129 substrain, Celera Mmu16,
C57BL/6, and MGSCv3 (Fig. 1.5.15). Each assembly has its own associated set of
annotations.
By default, the C57BL/6 assembly is colored blue and its annotations are shown. But the
assembly can be changed in the Maps & Options window by selecting the assembly name
from the assembly pull-down menu and clicking Change Assembly. Maps corresponding
to that genome assembly can then be added to the display in the Maps & Options window.

6. The navigation around all Map Viewers, including that for mouse, is similar to the
procedures described in the preceding protocols for human.
Using Biological
Databases

1.5.17
Current Protocols in Bioinformatics Supplement 16
Figure 1.5.15 An overview of the genomic context of the members of the ADAM gene family that
map to chromosome 14.

COMMENTARY
Background Information are subjected to shotgun sequencing. In whole-
The NCBI Map Viewer is currently genome shotgun sequencing, as used by Celera
available for 43 organisms, and, for Genomics to decipher the sequence of the hu-
each organism, it integrates genomic man genome, the entire genome is fragmented
data from a number of different sources into pieces, and these pieces are sequenced and
(http://www.ncbi.nlm.nih.gov/mapview/). For assembled. This strategy may bypass the need
example, the human Map Viewer includes for a clone-based physical map, the first step
sequence, cytogenetic, genetic linkage, and in clone-by-clone shotgun sequencing. Many
radiation hybrid maps. Some of these maps other publicly available genome sequences are
were generated by NCBI, and others are taken being generated by a hybrid approach using
from the scientific literature. An overview of both methods.
the process that NCBI uses to assemble and The working draft sequence of the hu-
annotate complete genomes is provided at man genome was published in 2001 (Lander
http://www.ncbi.nlm.nih.gov/genome/guide/ et al., 2001) and the finished genome se-
build.html. quence announced in 2003 (International
At least three strategies are currently be- Human Genome Sequencing Consortium,
ing used to sequence complete genomes, 2004). A sequence becomes finished when
and these are reviewed by Green (2001). it has been determined at an accuracy of at
The International Human Genome Sequenc- least 99.99% and has no gaps; sequence data
ing Consortium has relied on a method called falling short of that benchmark, but which
clone-by-clone shotgun sequencing to gener- can be positioned along the physical map
Using the NCBI ate the sequence of the human genome. In of the chromosomes, are termed “draft.” Se-
Map Viewer to brief, the genome is partitioned into a set of quences of draft clones are deposited into the
Browse Genomic
Sequence Data mapped, overlapping clones, and these clones High Throughput Genomic (HTG) division of

1.5.18
Supplement 16 Current Protocols in Bioinformatics
DDBJ/EMBL/GenBank, where they receive Critical Parameters and
an accession number. These draft clones may Troubleshooting
contain gaps, unordered or unoriented contigs, The human genomic sequence is a work
or sequencing errors. As the sequence of the in progress, and updates will continue, even
clone is completed, the sequence represented though it was declared finished in April, 2003.
by the accession number is updated until the NCBI periodically updates the human genome
clone is considered finished. Finished clones assembly based on these new sequence data.
are moved from HTG to the Primate division This “build” number is displayed prominently
of DDBJ/EMBL/GenBank. at the top of each Map Viewer page. The ex-
As needed, or about every two years amples in this unit were all illustrated using
now, NCBI assembles the sequences of the build 36. The mouse genome assembly will
individual human genomic clones into contigs also change over time; the version shown in
and chromosomes. These assemblies are per- this unit is an NCBI assembly called build
formed using the sequence data in GenBank as 36 as well. Although users working on later
of a set date. A description of this process is at mouse and human genome builds will not be
http://www.ncbi.nlm.nih.gov/genome/guide/ able to recreate the exact figures shown here,
human/release notes.html. The contigs be- the queries themselves will remain valid.
come part of the NCBI RefSeq project (Pruitt At present, only the two most recent hu-
et al., 2005), and are annotated with genes man and mouse builds are available at NCBI;
and other features and assigned an accession older builds can be retrieved from the UCSC
number of the format NT ######. The UCSC Genome Browser. Sequence coordinates along
Genome Browser (UNIT 1.4) and Ensembl the chromosome frequently change from build
(UNIT 1.15) use the NCBI chromosomes as the to build. Furthermore, changes in sequence
starting material in their annotation pipelines. data or algorithm implementation can some-
The mouse genome sequence has been times cause large changes in the assembly;
produced by a hybrid strategy that involves genes can move around within, or even be-
a combination of whole genome shotgun tween, chromosomes. In some cases, the as-
and clone by clone sequencing (Waterston sembly provided by NCBI may not be correct
et al., 2002). The assembly process is because of errors in the build process or in the
performed at NCBI and is described at underlying data. If a region of the assembly
http://www.ncbi.nlm.nih.gov/genome/guide/ is suspect, it may be worth reviewing older
mouse/release notes.html. In addition to versions of the genome assembly at UCSC.
the reference C57Bl/6 mouse genome as- The Basic Protocol of UNIT 1.4, describing the
sembly, NCBI also displays two additional UCSC Genome Browser, describes how to
whole-genome assemblies, the mixed-strain correlate genomic positions between different
Celera assembly and the C57BL/6 MGSCv3 assemblies.
assembly, as well as several strain-specific
partial genome assemblies (129 substrain, Suggestions for Further Analysis
A/J, B6/CBAF1J, Balb/c, C3H, NOD, and un- All of the data presented in the Map Viewer
known). The NCBI mouse genome assembly are also available for download from the NCBI
is also used by the UCSC Genome Browser FTP site. Advanced users who want to write
and Ensembl as starting material for their their own scripts to manipulate the data can
annotation pipelines. access text files from organism-specific direc-
Other complete or near-complete mam- tories at ftp://ftp.ncbi.nih.gov/genomes/. Nei-
malian genome sequences have also been de- ther the Web interface nor the databases can
scribed. Map Viewers for the following organ- be downloaded at this time.
isms are also available, with similar navigation The NCBI Map Viewer is only one view
to that shown for human and mouse: Brown of the human genome. In many cases, it may
Norway rat (Gibbs et al., 2004); chicken be useful to look at the same region of the
(Hillier et al., 2004); domestic dog (Lindblad- genome using the UCSC Genome Browser or
Toh et al., 2005); and chimpanzee (Chim- Ensembl. Since the three sites use different
panzee Sequencing and Analysis Consortium, methods to align mRNAs and ESTs to the
2005). NCBI has developed Map Viewers for genome, as well as different gene-prediction
an additional 34 organisms. The amount of se- algorithms, the positions or numbers of pre-
quence and annotation available in these Map dicted genes may vary. When doing such com-
Viewers is variable. parisons, however, one must be careful to
Using Biological
Databases

1.5.19
Current Protocols in Bioinformatics Supplement 16
check that the same assembly of the genome E., Mongin, E., Ureta-Vidal, A., Woodwark, C.,
is being viewed at each site. The user may also Zdobnov, E., Bork, P., Suyama, M., Torrents, D.,
Alexandersson, M., Trask, B.J., Young, J.M.,
discover that different sites are better for differ-
Huang, H., Wang, H., Xing, H., Daniels, S.,
ent types of queries. Query results load much Gietzen, D., Schmidt, J., Stevens, K., Vitt, U.,
more quickly in the UCSC Genome Browser Wingrove, J., Camara, F., Mar Alba, M., Abril,
than in either the Map Viewer or Ensembl, J.F., Guigo, R., Smit, A., Dubchak, I., Rubin,
making UCSC more efficient for fast searches. E.M., Couronne, O., Poliakov, A., Hubner, N.,
Ganten, D., Goesele, C., Hummel, O., Kreitler,
At both NCBI and Ensembl, sequence compar-
T., Lee, Y.A., Monti, J., Schulz, H., Zimdahl,
isons against the genome are performed with H., Himmelbauer, H., Lehrach, H., Jacob, H.J.,
BLAST. UCSC provides the BLAT program, Bromberg, S., Gullings-Handley, J., Jensen-
which, while sometimes not as sensitive as Seaman, M.I., Kwitek, A.E., Lazar, J., Pasko,
BLAST, is often much faster. Both UCSC and D., Tonellato, P.J., Twigger, S., Ponting, C.P.,
Duarte, J.M., Rice, S., Goodstadt, L., Beatson,
Ensembl allow users to display their own data
S.A., Emes, R.D., Winter, E.E., Webber,
in the context of the publicly available anno- C., Brandt, P., Nyakatura, G., Adetobi, M.,
tations, a tool that NCBI does not yet provide. Chiaromonte, F., Elnitski, L., Eswara, P.,
However, NCBI provides more non-sequence- Hardison, R.C., Hou, M., Kolbe, D., Makova,
based maps, such as the Mitelman Breakpoint, K., Miller, W., Nekrutenko, A., Riemer, C.,
Schwartz, S., Taylor, J., Yang, S., Zhang, Y.,
deCODE, and Stanford G3 maps. In short, in
Lindpaintner, K., Andrews, T.D., Caccamo, M.,
order to make the most of the human genome Clamp, M., Clarke, L., Curwen, V., Durbin,
data, users should learn to use all three sites. R., Eyras, E., Searle, S.M., Cooper, G.M.,
Batzoglou, S., Brudno, M., Sidow, A., Stone,
Disclaimer E.A., Venter, J.C., Payseur, B.A., Bourque, G.,
This unit was written by Dr. Tyra G. Wolfs- Lopez-Otin, C., Puente, X.S., Chakrabarti, K.,
Chatterji, S., Dewey, C., Pachter, L., Bray,
berg in her private capacity. No official sup-
N., Yap, V.B., Caspi, A., Tesler, G., Pevzner,
port or endorsement by the National Institutes P.A., Haussler, D., Roskin, K.M., Baertsch, R.,
of Health or the United States Department Clawson, H., Furey, T.S., Hinrichs, A.S.,
of Health and Human Services is intended or Karolchik, D., Kent, W.J., Rosenbloom, K.R.,
should be inferred. Trumbower, H., Weirauch, M., Cooper, D.N.,
Stenson, P.D., Ma, B., Brent, M., Arumugam,
M., Shteynberg, D., Copley, R.R., Taylor, M.S.,
Literature Cited Riethman, H., Mudunuri, U., Peterson, J.,
Chimpanzee Sequencing and Analysis Consor- Guyer, M., Felsenfeld, A., Old, S., Mockrin,
tium. 2005. Initial sequence of the chim- S., and Collins, F. 2004. Genome sequence of
panzee genome and comparison with the human the Brown Norway rat yields insights into mam-
genome. Nature 437:69-87. malian evolution. Nature 428:493-521.
Gibbs, R.A., Weinstock, G.M., Metzker, M.L., Green, E.D. 2001. Strategies for the systematic se-
Muzny, D.M., Sodergren, E.J., Scherer, S., quencing of complex genomes. Nat. Rev. Genet.
Scott, G., Steffen, D., Worley, K.C., Burch, P.E., 2:573-583.
Okwuonu, G., Hines, S., Lewis, L., DeRamo,
C., Delgado, O., Dugan-Rocha, S., Miner, G., Hillier, L.W., Miller, W., Birney, E., Warren, W.,
Morgan, M., Hawes, A., Gill, R., Celera, Holt, Hardison, R.C., Ponting, C.P., Bork, P., Burt,
R.A., Adams, M.D., Amanatides, P.G., Baden- D.W., Groenen, M.A., Delany, M.E., Dodgson,
Tillson, H., Barnstead, M., Chin, S., Evans, J.B., Chinwalla, A.T., Cliften, P.F., Clifton,
C.A., Ferriera, S., Fosler, C., Glodek, A., S.W., Delehaunty, K.D., Fronick, C., Fulton,
Gu, Z., Jennings, D., Kraft, C.L., Nguyen, R.S., Graves, T.A., Kremitzki, C., Layman,
T., Pfannkoch, C.M., Sitter, C., Sutton, G.G., D., Magrini, V., McPherson, J.D., Miner, T.L.,
Venter, J.C., Woodage, T., Smith, D., Lee, Minx, P., Nash, W.E., Nhan, M.N., Nelson,
H.M., Gustafson, E., Cahill, P., Kana, A., J.O., Oddy, L.G., Pohl, C.S., Randall-Maher,
Doucette-Stamm, L., Weinstock, K., Fechtel, J., Smith, S.M., Wallis, J.W., Yang, S.P.,
K., Weiss, R.B., Dunn, D.M., Green, E.D., Romanov, M.N., Rondelli, C.M., Paton, B.,
Blakesley, R.W., Bouffard, G.G., De Jong, P.J., Smith, J., Morrice, D., Daniels, L., Tempest,
Osoegawa, K., Zhu, B., Marra, M., Schein, H.G., Robertson, L., Masabanda, J.S., Griffin,
J., Bosdet, I., Fjell, C., Jones, S., Krzywinski, D.K., Vignal, A., Fillon, V., Jacobbson,
M., Mathewson, C., Siddiqui, A., Wye, N., L., Kerje, S., Andersson, L., Crooijmans,
McPherson, J., Zhao, S., Fraser, C.M., Shetty, R.P., Aerts, J., van der Poel, J.J., Ellegren,
J., Shatsman, S., Geer, K., Chen, Y., Abramzon, H., Caldwell, R.B., Hubbard, S.J., Grafham,
S., Nierman, W.C., Havlak, P.H., Chen, R., D.V., Kierzek, A.M., McLaren, S.R., Overton,
Durbin, K.J., Egan, A., Ren, Y., Song, X.Z., I.M., Arakawa, H., Beattie, K.J., Bezzubov, Y.,
Li, B., Liu, Y., Qin, X., Cawley, S., Worley, Boardman, P.E., Bonfield, J.K., Croning, M.D.,
K.C., Cooney, A.J., D’Souza, L.M., Martin, K., Davies, R.M., Francis, M.D., Humphray, S.J.,
Using the NCBI Wu, J.Q., Gonzalez-Garay, M.L., Jackson, A.R., Scott, C.E., Taylor, R.G., Tickle, C., Brown,
Map Viewer to Kalafus, K.J., McLeod, M.P., Milosavljevic, A., W.R., Rogers, J., Buerstedde, J.M., Wilson,
Browse Genomic Virk, D., Volkov, A., Wheeler, D.A., Zhang, Z., S.A., Stubbs, L., Ovcharenko, I., Gordon, L.,
Sequence Data Bailey, J.A., Eichler, E.E., Tuzun, E., Birney, Lucas, S., Miller, M.M., Inoko, H., Shiina,

1.5.20
Supplement 16 Current Protocols in Bioinformatics
T., Kaufman, J., Salomonsen, J., Skjoedt, K., Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda,
Wong, G.K., Wang, J., Liu, B., Wang, J., Yu, J., A., Itoh, T., Kawagoe, C., Watanabe, H.,
Yang, H., Nefedov, M., Koriabine, M., Dejong, Totoki, Y., Taylor, T., Weissenbach, J., Heilig,
P.J., Goodstadt, L., Webber, C., Dickens, N.J., R., Saurin, W., Artiguenave, F., Brottier, P.,
Letunic, I., Suyama, M., Torrents, D., von Bruls, T., Pelletier, E., Robert, C., Wincker, P.,
Mering, C., Zdobnov, E.M., Makova, K., Smith, D.R., Doucette-Stamm, L., Rubenfield,
Nekrutenko, A., Elnitski, L., Eswara, P., King, M., Weinstock, K., Lee, H.M., Dubois, J.,
D.C., Yang, S., Tyekucheva, S., Radakrishnan, Rosenthal, A., Platzer, M., Nyakatura, G.,
A., Harris, R.S., Chiaromonte, F., Taylor, Taudien, S., Rump, A., Yang, H., Yu, J., Wang,
J., He, J., Rijnkels, M., Griffiths-Jones, S., J., Huang, G., Gu, J., Hood, L., Rowen, L.,
Ureta-Vidal, A., Hoffman, M.M., Severin, Madan, A., Qin, S., Davis, R.W., Federspiel,
J., Searle, S.M., Law, A.S., Speed, D., N.A., Abola, A.P., Proctor, M.J., Myers, R.M.,
Waddington, D., Cheng, Z., Tuzun, E., Eichler, Schmutz, J., Dickson, M., Grimwood, J., Cox,
E., Bao, Z., Flicek, P., Shteynberg, D.R., Olson, M.V., Kaul, R., Raymond, C.,
D.D., Brent, M.R., Bye, J.M., Huckle, Shimizu, N., Kawasaki, K., Minoshima, S.,
E.J., Chatterji, S., Dewey, C., Pachter, L., Evans, G.A., Athanasiou, M., Schultz, R., Roe,
Kouranov, A., Mourelatos, Z., Hatzigeorgiou, B.A., Chen, F., Pan, H., Ramser, J., Lehrach, H.,
A.G., Paterson, A.H., Ivarie, R., Brandstrom, Reinhardt, R., McCombie, W.R., de la Bastide,
M., Axelsson, E., Backstrom, N., Berlin, S., M., Dedhia, N., Blocker, H., Hornischer, K.,
Webster, M.T., Pourquie, O., Reymond, A., Nordsiek, G., Agarwala, R., Aravind, L., Bailey,
Ucla, C., Antonarakis, S.E., Long, M., Emerson, J.A., Bateman, A., Batzoglou, S., Birney, E.,
J.J., Betran, E., Dupanloup, I., Kaessmann, Bork, P., Brown, D.G., Burge, C.B., Cerutti, L.,
H., Hinrichs, A.S., Bejerano, G., Furey, T.S., Chen, H.C., Church, D., Clamp, M., Copley,
Harte, R.A., Raney, B., Siepel, A., Kent, W.J., R.R., Doerks, T., Eddy, S.R., Eichler, E.E.,
Haussler, D., Eyras, E., Castelo, R., Abril, J.F., Furey, T.S., Galagan, J., Gilbert, J.G., Harmon,
Castellano, S., Camara, F., Parra, G., Guigo, R., C., Hayashizaki, Y., Haussler, D., Hermjakob,
Bourque, G., Tesler, G., Pevzner, P.A., Smit, A., H., Hokamp, K., Jang, W., Johnson, L.S., Jones,
Fulton, L.A., Mardis, E.R., and Wilson, R.K. T.A., Kasif, S., Kaspryzk, A., Kennedy, S.,
2004. Sequence and comparative analysis of the Kent, W.J., Kitts, P., Koonin, E.V., Korf, I.,
chicken genome provide unique perspectives Kulp, D., Lancet, D., Lowe, T.M., McLysaght,
on vertebrate evolution. Nature 432:695-716. A., Mikkelsen, T., Moran, J.V., Mulder, N.,
Pollara, V.J., Ponting, C.P., Schuler, G.,
International Human Genome Sequencing Consor-
Schultz, J., Slater, G., Smit, A.F., Stupka, E.,
tium. 2004. Finishing the euchromatic sequence
Szustakowski, J., Thierry-Mieg, D., Thierry-
of the human genome. Nature 431:931-945.
Mieg, J., Wagner, L., Wallis, J., Wheeler,
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, R., Williams, A., Wolf, Y.I., Wolfe, K.H.,
C., Zody, M.C., Baldwin, J., Devon, K., Dewar, Yang, S.P., Yeh, R.F., Collins, F., Guyer, M.S.,
K., Doyle, M., FitzHugh, W., Funke, R., Gage, Peterson, J., Felsenfeld, A., Wetterstrand,
D., Harris, K., Heaford, A., Howland, J., Kann, K.A., Patrinos, A., Morgan, M.J., de Jong,
L., Lehoczky, J., LeVine, R., McEwan, P., P., Catanese, J.J., Osoegawa, K., Shizuya, H.,
McKernan, K., Meldrim, J., Mesirov, J.P., Choi, S., and Chen, Y.J. 2001. Initial sequencing
Miranda, C., Morris, W., Naylor, J., Raymond, and analysis of the human genome. Nature
C., Rosetti, M., Santos, R., Sheridan, A., 409:860-921.
Sougnez, C., Stange-Thomann, N., Stojanovic, Lindblad-Toh, K., Wade, C.M., Mikkelsen, T.S.,
N., Subramanian, A., Wyman, D., Rogers, J., Karlsson, E.K., Jaffe, D.B., Kamal, M., Clamp,
Sulston, J., Ainscough, R., Beck, S., Bentley, M., Chang, J.L., Kulbokas, E.J. 3rd, Zody,
D., Burton, J., Clee, C., Carter, N., Coulson, M.C., Mauceli, E., Xie, X., Breen, M., Wayne,
A., Deadman, R., Deloukas, P., Dunham, A., R.K., Ostrander, E.A., Ponting, C.P., Galibert,
Dunham, I., Durbin, R., French, L., Grafham, F., Smith, D.R., DeJong, P.J., Kirkness, E.,
D., Gregory, S., Hubbard, T., Humphray, S., Alvarez, P., Biagi, T., Brockman, W., Butler,
Hunt, A., Jones, M., Lloyd, C., McMurray, A., J., Chin, C.W., Cook, A., Cuff, J., Daly, M.J.,
Matthews, L., Mercer, S., Milne, S., Mullikin, DeCaprio, D., Gnerre, S., Grabherr, M., Kellis,
J.C., Mungall, A., Plumb, R., Ross, M., M., Kleber, M., Bardeleben, C., Goodstadt, L.,
Shownkeen, R., Sims, S., Waterston, R.H., Heger, A., Hitte, C., Kim, L., Koepfli, K.P.,
Wilson, R.K., Hillier, L.W., McPherson, Parker, H.G., Pollinger, J.P., Searle, S.M., Sutter,
J.D., Marra, M.A., Mardis, E.R., N.B., Thomas, R., Webber, C., Baldwin, J.,
Fulton, L.A., Chinwalla, A.T., Pepin, K.H., Abebe, A., Abouelleil, A., Aftuck, L., Ait-
Gish, W.R., Chissoe, S.L., Wendl, M.C., Zahra, M., Aldredge, T., Allen, N., An, P.,
Delehaunty, K.D., Miner, T.L., Delehaunty, A., Anderson, S., Antoine, C., Arachchi, H., Aslam,
Kramer, J.B., Cook, L.L., Fulton, R.S., Johnson, A., Ayotte, L., Bachantsang, P., Barry, A., Bayul,
D.L., Minx, P.J., Clifton, S.W., Hawkins, T., T., Benamara, M., Berlin, A., Bessette, D.,
Branscomb, E., Predki, P., Richardson, P., Blitshteyn, B., Bloom, T., Blye, J.,
Wenning, S., Slezak, T., Doggett, N., Cheng, Boguslavskiy, L., Bonnet, C., Boukhgalter,
J.F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, B., Brown, A., Cahill, P., Calixte, N., Camarata,
E., Frazier, M., Gibbs, R.A., Muzny, D.M., J., Cheshatsang, Y., Chu, J., Citroen, M.,
Scherer, S.E., Bouck, J.B., Sodergren, E.J., Collymore, A., Cooke, P., Dawoe, T., Daza, R.,
Worley, K.C., Rives, C.M., Gorrell, J.H., Decktor, K., DeGray, S., Dhargay, N., Dooley,
Metzker, M.L., Naylor, S.L., Kucherlapati, K., Dooley, K., Dorje, P., Dorjee, K., Dorris, L., Using Biological
R.S., Nelson, D.L., Weinstock, G.M., Sakaki, Databases
Duffey, N., Dupes, A., Egbiremolen, O., Elong,
1.5.21
Current Protocols in Bioinformatics Supplement 16
R., Falk, J., Farina, A., Faro, S., Ferguson, D., K., Frankel, W.N., Fulton, L.A., Fulton, R.S.,
Ferreira, P., Fisher, S., FitzGerald, M., Foley, Furey, T.S., Gage, D., Gibbs, R.A., Glusman,
K., Foley, C., Franke, A., Friedrich, D., Gage, G., Gnerre, S., Goldman, N., Goodstadt,
D., Garber, M., Gearin, G., Giannoukos, G., L., Grafham, D., Graves, T.A., Green, E.D.,
Goode, T., Goyette, A., Graham, J., Grandbois, Gregory, S., Guigo, R., Guyer, M., Hardison,
E., Gyaltsen, K., Hafez, N., Hagopian, D., R.C., Haussler, D., Hayashizaki, Y., Hillier,
Hagos, B., Hall, J., Healy, C., Hegarty, R., L.W., Hinrichs, A., Hlavina, W., Holzer, T.,
Honan, T., Horn, A., Houde, N., Hughes, L., Hsu, F., Hua, A., Hubbard, T., Hunt, A., Jack-
Hunnicutt, L., Husby, M., Jester, B., Jones, C., son, I., Jaffe, D.B., Johnson, L.S., Jones, M.,
Kamat, A., Kanga, B., Kells, C., Khazanovich, Jones, T.A., Joy, A., Kamal, M., Karlsson, E.K.,
D., Kieu, A.C., Kisner, P., Kumar, M., Lance, Karolchik, D., Kasprzyk, A., Kawai, J., Keibler,
K., Landers, T., Lara, M., Lee, W., Leger, J.P., E., Kells, C., Kent, W.J., Kirby, A., Kolbe,
Lennon, N., Leuper, L., LeVine, S., Liu, J., D.L., Korf, I., Kucherlapati, R.S., Kulbokas,
Liu, X., Lokyitsang, Y., Lokyitsang, T., Lui, A., E.J., Kulp, D., Landers, T., Leger, J.P., Leonard,
Macdonald, J., Major, J., Marabella, R., Maru, S., Letunic, I., Levine, R., Li, J., Li, M., Lloyd,
K., Matthews, C., McDonough, S., Mehta, C., Lucas, S., Ma, B., Maglott, D.R., Mardis,
T., Meldrim, J., Melnikov, A., Meneus, L., E.R., Matthews, L., Mauceli, E., Mayer, J.H.,
Mihalev, A., Mihova, T., Miller, K., Mittelman, McCarthy, M., McCombie, W.R., McLaren,
R., Mlenga, V., Mulrain, L., Munson, G., S., McLay, K., McPherson, J.D., Meldrim, J.,
Navidi, A., Naylor, J., Nguyen, T., Nguyen, N., Meredith, B., Mesirov, J.P., Miller, W., Miner,
Nguyen, C., Nguyen, T., Nicol, R., Norbu, N., T.L., Mongin, E., Montgomery, K.T., Morgan,
Norbu, C., Novod, N., Nyima, T., Olandt, P., M., Mott, R., Mullikin, J.C., Muzny, D.M.,
O’Neill, B., O’Neill, K., Osman, S., Oyono, Nash, W.E., Nelson, J.O., Nhan, M.N., Nicol,
L., Patti, C., Perrin, D., Phunkhang, P., Pierre, R., Ning, Z., Nusbaum, C., O’Connor, M.J.,
F., Priest, M., Rachupka, A., Raghuraman, S., Okazaki, Y., Oliver, K., Overton-Larty, E.,
Rameau, R., Ray, V., Raymond, C., Rege, F., Pachter, L., Parra, G., Pepin, K.H., Peterson,
Rise, C., Rogers, J., Rogov, P., Sahalie, J., J., Pevzner, P., Plumb, R., Pohl, C.S., Poliakov,
Settipalli, S., Sharpe, T., Shea, T., Sheehan, M., A., Ponce, T.C., Ponting, C.P., Potter, S., Quail,
Sherpa, N., Shi, J., Shih, D., Sloan, J., Smith, M., Reymond, A., Roe, B.A., Roskin, K.M.,
C., Sparrow, T., Stalker, J., Stange-Thomann, Rubin, E.M., Rust, A.G., Santos, R., Sapojnikov,
N., Stavropoulos, S., Stone, C., Stone, S., V., Schultz, B., Schultz, J., Schwartz, M.S.,
Sykes, S., Tchuinga, P., Tenzing, P., Tesfaye, Schwartz, S., Scott, C., Seaman, S., Searle, S.,
S., Thoulutsang, D., Thoulutsang, Y., Topham, Sharpe, T., Sheridan, A., Shownkeen, R., Sims,
K., Topping, I., Tsamla, T., Vassiliev, H., S., Singer, J.B., Slater, G., Smit, A., Smith, D.R.,
Venkataraman, V., Vo, A., Wangchuk, T., Spencer, B., Stabenau, A., Stange-Thomann, N.,
Wangdi, T., Weiand, M., Wilkinson, J., Wilson, Sugnet, C., Suyama, M., Tesler, G., Thomp-
A., Yadav, S., Yang, S., Yang, X., Young, G., son, J., Torrents, D., Trevaskis, E., Tromp, J.,
Yu, Q., Zainoun, J., Zembek, L., Zimmer, A., Ucla, C., Ureta-Vidal, A., Vinson, J.P., Von
and Lander, E.S. 2005. Genome sequence, Niederhausern, A.C., Wade, C.M., Wall, M.,
comparative analysis and haplotype structure of Weber, R.J., Weiss, R.B., Wendl, M.C., West,
the domestic dog. Nature 438:803-819. A.P., Wetterstrand, K., Wheeler, R., Whelan,
Pruitt, K.D., Tatusova, T., and Maglott, D.R. 2005. S., Wierzbowski, J., Willey, D., Williams, S.,
NCBI Reference Sequence (RefSeq): A curated Wilson, R.K., Winter, E., Worley, K.C., Wyman,
non-redundant sequence database of genomes, D., Yang, S., Yang, S.P., Zdobnov, E.M., Zody,
transcripts and proteins. Nucleic Acids Res. M.C., and Lander, E.S. 2002. Initial sequencing
33:D501-D504. and comparative analysis of the mouse genome.
Nature 420:520-562.
Schuler, G.D. 1998. Electronic PCR: Bridging the
gap between genome mapping and genome se-
quencing. Trends Biotechnol. 16:456-459. Key References
Wolfsberg, T.G., Wetterstrand, K.A., Guyer, M.S.,
Waterston, R.H., Lindblad-Toh, K., Birney, E., Collins, F.S., and Baxevanis, A.D. 2003. A
Rogers, J., Abril, J.F., Agarwal, P., Agarwala, user’s guide to the human genome. Nat. Genet.
R., Ainscough, R., Alexandersson, M., An, P., 35:1-79.
Antonarakis, S.E., Attwood, J., Baertsch, R.,
The User’s Guide to the Genome is a hands-on
Bailey, J., Barlow, K., Beck, S., Berry, E.,
manual for browsing and analyzing genomic data.
Birren, B., Bloom, T., Bork, P., Botcherby, M.,
The majority of the supplement shows a series of
Bray, N., Brent, M.R., Brown, D.G., Brown,
examples from Ensembl, the NCBI Map Viewer,
S.D., Bult, C., Burton, J., Butler, J., Campbell,
and the UCSC Genome Browser that provide an-
R.D., Carninci, P., Cawley, S., Chiaromonte,
swers to the most common types of questions in
F., Chinwalla, A.T., Church, D.M., Clamp, M.,
sequence-based biology. The User’s Guide is avail-
Clee, C., Collins, F.S., Cook, L.L., Copley, R.R.,
able free of charge from the Nature Genetics Web
Coulson, A., Couronne, O., Cuff, J., Curwen,
site at http://www.nature.com/ng/supplements/.
V., Cutts, T., Daly, M., David, R., Davies, J.,
Delehaunty, K.D., Deri, J., Dermitzakis, E.T.,
Dewey, C., Dickens, N.J., Diekhans, M., Dodge,
Using the NCBI S., Dubchak, I., Dunn, D.M., Eddy, S.R., Contributed by Tyra G. Wolfsberg
Map Viewer to Elnitski, L., Emes, R.D., Eswara, P., Eyras, E., Bethesda, Maryland
Browse Genomic Felsenfeld, A., Fewell, G.A., Flicek, P., Foley,
Sequence Data

1.5.22
Supplement 16 Current Protocols in Bioinformatics

You might also like