Professional Documents
Culture Documents
Lectura Complementaria 6
Lectura Complementaria 6
Lectura Complementaria 6
1093/bib/bbt001
Advance Access published on 1 February 2013
Abstract
GBrowse is a mature web-based genome browser that is suitable for deployment on both public and private
web sites. It supports most of genome browser features, including qualitative and quantitative (wiggle) tracks,
track uploading, track sharing, interactive track configuration, semantic zooming and limited smooth track panning.
Corresponding author. Lincoln D. Stein, Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, Ontario,
Canada MSG 0A3, and Department of Molecular Genetics, University of Toronto, Canada. Tel: 416 673-8514; Fax: 416 977-1118;
Email: lincoln.stein@gmail.com
Lincoln Stein is the Director of Informatics and Biocomputing at the Ontario Institute for Cancer Research and is a lead investigator
on multiple public genome resources including Reactome, WormBase and modENCODE.
Remote ssh access from non-host machines is filter for public images named ‘GBrowse’. Right
disabled by default, but you can enable it by config- click on the AMI with the latest version number
uring an ethernet bridge adaptor as described and select ‘Launch instance’ to bring up the request
in Chapter 6 of the Virtual Box manual (www.vir- instances wizard.
tualbox.org/manual/ch06.html). Note that if you The wizard will prompt you for a number of
enable remote access to the VM, it is a very good properties of the virtual machine to launch. The
idea to change the Administrator password, which most important of these is the Instance Type,
you can do by issuing the ‘passwd’ command from which controls the number and speed of CPUs
the command line or by selecting ‘System->Users and the amount of memory that the VM will
and Groups’ and using the graphical user interface have. For GBrowse, you should choose at least the
to change Administrator’s password. ‘Small’ instance. For better performance, choose the
‘Medium’ or ‘Large’ instances. Faster instances cost
more per hour.
cerevisiae, 11 April 2012, SacCer_Apr2011/sacCer3) We will first discuss uploading. For this example,
and the nematode (Caenorhabditis elegans, we use a small (5.4 M) modENCODE (www.mod
October 2010, WS220/ce10). The Amazon Virtual encode.org) SAM file obtained by performing
Machine edition includes the yeast and nematode RACE sequencing of the 30 -UTRs of C. elegans L1
genomes, as well as human (Homo sapiens, February larval RNA. Download the file using either the
2009, GRCh37/hg19). These databases contain the full URL ftp://data.modencode.org/all_files/cele-
chromosome sizes, genomic DNA and a set of ref- signal-1/2327_L1.ws220.sam.gz or its equivalent
erence gene models and noncoding RNAs. ‘tiny URL’ is http://tinyurl.com/9ns9fjz. If you try
To add additional databases, both virtual machines this with your own SAM file, be careful to match the
come with the ‘import_ucsc_db.pl’ script, which genome build (WS220/ce10) and the naming con-
creates starter databases from information in the vention for the chromosomes. The starter GBrowse
UCSC genome browser (genome.ucsc.edu). This databases all use unadorned chromosome names,
can be used to add the human hg19 genome build such as ‘1’ and ‘III’. This is consistent with the
Figure 2: Summary information about an uploaded SAM file. The summary information includes the name and
description of the uploaded data, which can be edited by clicking on the respective fields, and information about
the date and size of the upload. The ‘Sharing’ area allows the user to enable sharing of the track with select collab-
orators or with the public as a whole.
shows details about the read and a text representation Tracks’ and click on the ‘Sharing’ popup menu.
of the alignment. To change the appearance of the Select ‘Casual’ sharing to get a sharing link, and
sequence alignment track, click on the toolbox icon email this link to whoever you wish. They can get
that appears in the track’s titlebar. This will bring up access to the track by clicking on this link in the
a dialog that allows you to change colors, size, the received email.
presence or absence of read names and various other To make a track public, select ‘Custom
features. Tracks->Sharing->Public’. This will enable anyone
You may upload a BAM file in exactly the to find and view your track using the search fea-
same manner as you uploaded a SAM file. The ad- tures of the ‘Community Tracks’ panel. To aid in
vantage of this over SAM format is that processing will sharing, you should give your public track a good
be quicker because the server does not have to convert descriptive name and description, which you can do
it into BAM internally. You may upload as many by clicking on the upload’s name and description
BAM/SAM tracks as you like and select which ones fields.
are displaying using the ‘Select Tracks’ panel. The last sharing option is called ‘Group’. In this
Uploaded files are inaccessible to other GBrowse mode, you can share the track with a specific set of
users unless they are explicitly shared. However, the named collaborators. For this to work, you will need
files are readable by anyone who can log into the to know the collaborators’ email addresses or
virtual machine. GBrowse login names. Select ‘Custom Tracks-
>Sharing->Group’, and then type in a portion of
Track sharing the first collaborator’s email address or login name
Uploaded BAM/SAM tracks can be shared with in the ‘Enter a username’ text field (autocomplete
collaborators. To do this, go back to ‘Custom will help you select the correct user). Click ‘Add’ to
168 Stein
authorize this user. You may repeat this process mul- high-coverage Illumina sequence from an anonym-
tiple times to add additional collaborators. ous individual, mapped onto chromosome 1 of the
When you are finished with your upload(s), go to GRCh37 build of the reference genome.
‘Custom Tracks’ and click on the trash can icon to If you are using the VirtualBox VM, you will first
delete the ones you wish. need to install a starter human hg19/GRCh37 data-
base. Log into the VM as the ‘admin’ user (password
Uploading a BAM/SAM file as the ‘gbrowse’), open a terminal window and type the
administrator following command:
With a slight modification of the above recipe, you
can upload a NGS file in a way that allows it to import_ucsc_db.pl –remove-chr hg19 ‘H. sapiens
become listed as a public track. The only difference (hg19/GRCh37)’
is that you must log into GBrowse as the adminis-
This will contact the UCSC genome browser to
then make it owned by the admin user. We use The ‘sudo’ is needed because hg19.conf is normally
the ‘sudo’ command to gain root privileges to owned by the ‘root’ user, although you are free to
allow this: change this.
Restart the web server with:
sudo mkdir /opt/gbrowse/databases/hg19/
sudo service apache2 restart
NGS_alignments
sudo chown admin /opt/gbrowse/databases/
You will now be able to see the alignment track in
hg19/NGS_alignments
the ‘Select Tracks’ section of the genome browser
page (remember that only chromosome 1 is repre-
Next, copy one or more alignment files into this
sented in the downloaded file!). If you wish to cus-
directory. You may use BAM or SAM files, and the
tomize any aspect of the track, such as its name, you
SAM files may be compressed with gzip if you wish.
simply edit the appropriate track configuration sec-
To get the files, you may use the ‘wget’ command to
GBrowse runs in a web server and is accessed via any modern 2. Mungall CJ, Emmert DB; FlyBase Consortium; A Chado
web browser. case study: an ontology-based modular schema for repre-
Next-generation sequencing data tracks can be installed per- senting genome-associated biological information.
manently as public tracks, uploaded on as as-needed basis, im- Bioinformatics 2007;23(13):i337–46.
ported via URLs and selectively shared with other users. 3. McKay SJ, Vergara IA, Stajich JE. Using the Generic
The software is most suitable in a collaborative environment Synteny Browser (GBrowse_syn). Curr Protoc Bioinformatics
where visualization of sequencing data is shared among multiple 2010;Chapter 9:Unit 9.12.
local and remote collaborators.
GBrowse is available as preconfigured virtual machines running
4. Pan X, Stein L, Brendel V. SynBrowse: a synteny browser
on the desktop or the Amazon Elastic Compute Cloud, as well for comparative sequence analysis. Bioinformatics 2005;
as in source code and binary form. 21(17):3461–8.
5. Youens-Clark K, Faga B, Yap IV, et al. CMap 1.01: a
comparative mapping application for the Internet.
Bioinformatics 2009;25(22):3040–2.
Acknowledgements 6. Blankenberg D, Coraor N, Von Kuster G, et al. Integrating
The author wishes to thank Dr Scott Cain for assistance with diverse databases into an unified analysis framework: a
configuring and testing the VMs and four anonymous reviewers