Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Community Curation and Expert

Curation of Human Long Noncoding


RNAs with LncRNAWiki and LncBook
Lina Ma,1,2,5,6 Jiabao Cao,1,2,3,5 Lin Liu,1,2,3,5 Zhao Li,1,2,3 Huma Shireen,4
Nashaiman Pervaiz,4 Fatima Batool,4 Rabail Z. Raza,4 Dong Zou,1,2
Yiming Bao,1,2,3 Amir A. Abbasi,4 and Zhang Zhang1,2,3,6
1
BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing,
China
2
CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics,
Chinese Academy of Sciences, Beijing, China
3
University of Chinese Academy of Sciences, Beijing, China
4
National Center for Bioinformatics, Programme of Comparative and Evolutionary
Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
5
These authors contributed equally to this work.
6
Corresponding authors: malina@big.ac.cn; zhangzhang@big.ac.cn

In recent years, the number of human long noncoding RNAs (lncRNAs)


that have been identified has increased exponentially. However, these
lncRNAs are poorly annotated compared to protein-coding genes, posing
great challenges for a better understanding of their functional significance
and elucidating their complex functioning molecular mechanisms. Here we
employ both community and expert curation to yield a comprehensive col-
lection of human lncRNAs and their annotations. Specifically, LncRNAWiki
(http://lncrna.big.ac.cn/index.php/Main_Page) uses a wiki-based community
curation model, thus showing great promise in dealing with the flood of bi-
ological knowledge, while LncBook (http://bigd.big.ac.cn/lncbook) is an ex-
pert curation–based database that provides a complement to LncRNAWiki.
LncBook features a comprehensive collection of human lncRNAs and a sys-
tematic curation of lncRNAs by multi-omics data integration, functional anno-
tation, and disease association. These protocols provide step-by-step instruc-
tions on how to browse and search a specific lncRNA and how to obtain a range
of related information including expression, methylation, variation, function,
and disease association. C 2019 by John Wiley & Sons, Inc.

Keywords: community curation r disease r expert curation r long noncoding


RNA r multi-omics data

How to cite this article:


Ma, L., Cao, J., Liu, L., Li, Z., Shireen, H., Pervaiz, N., Batool, F.,
Raza, R. Z., Zou, D., Bao, Y., Abbasi, A. A., & Zhang, Z. (2019).
Community curation and expert curation of human long noncoding
RNAs with LncRNAWiki and LncBook. Current Protocols in
Bioinformatics, 67, e82. doi: 10.1002/cpbi.82

INTRODUCTION
Due to the rapid advancement of next-generation sequencing technology, more and more
human long noncoding RNAs (lncRNAs) have been discovered in many species ranging
from mammals to plants (Fang et al., 2018). In particular, the number of known human
lncRNAs has increased exponentially (Derrien et al., 2012; Fang et al., 2018; Iyer et al.,

Ma et al.

Current Protocols in Bioinformatics e82, Volume 67 1 of 19


Published in Wiley Online Library (wileyonlinelibrary.com).
doi: 10.1002/cpbi.82

C 2019 John Wiley & Sons, Inc.
2015; Ma et al., 2015, 2019; Volders et al., 2015). For example, in 2014 we integrated
human lncRNAs from different sources and obtained 105,255 lncRNA transcripts, and
this number has continued to steadily increase, resulting in 270,044 lncRNA transcripts
in 2018 (Ma et al., 2019).

Given the large number of human lncRNAs, it would be quite laborious and time con-
suming to rely mainly on expert curation to curate these lncRNAs. However, a wiki
platform enables any user to edit the information at any time and thus features collabo-
rative information integration, up-to-date content, and low-maintenance cost. Based on
MediaWiki, we developed LncRNAWiki (http://lncrna.big.ac.cn/index.php/Main_Page)
to harnesses collective efforts to collect, edit, and annotate information about human
lncRNAs (Ma et al., 2015). In the past years, LncRNAWiki has been frequently updated
by curating more experimentally validated human lncRNAs, linking lncRNAs to diseases,
and identifying small peptides encoded by lncRNAs (BIG Data Center Members, 2017,
2018, 2019). In LncRNAWiki, users can search lncRNAs and browse basic information,
including genomic location, classification, exon number, and sequence, and obtain the
annotations on function, expression, and disease association, among others. In addition,
it is convenient to add a newly discovered lncRNA and edit/curate existing lncRNAs by
registered users.

While LncRNAWiki has been an extremely useful tool for building a database of lncRNAs
and their associated information, it has significant limitations on managing structured data
and providing customized functionalities. In LncRNAWiki, the functional annotations
and sequence data are stored as unstructured text, which makes it difficult to retrieve and
show data items of interest. To organize large-scale annotations in a structured manner
and to provide customized Web functionalities with more friendly interfaces, we con-
structed LncBook (http://bigd.big.ac.cn/lncbook; Ma et al., 2019) as a complement to
the community curation–based LncRNAWiki. In LncBook, users can obtain the system-
atically curated function and disease information, which are derived from LncRNAWiki.
Most importantly, users can access large-scale lncRNA-related data including expres-
sion, methylation, variation, and interaction, among others, while also taking advantage
of fully incorporated analysis tools.

Collectively, the community-and-expert curation model, as implemented in LncRNAWiki


and LncBook, is of great importance to provide up-to-date knowledge, high-quality an-
notations, and abundant omics data, which can significantly enrich and improve lncR-
NAs’ annotations and support studies for in-depth investigation of lncRNA function and
mechanism.

BASIC COMMUNITY CURATION OF HUMAN LNCRNAS USING LncRNAWiki


PROTOCOL 1
LncRNAWiki is dedicated to community curation of human lncRNAs, especially the
functionally studied lncRNAs. In the present version of LncRNAWiki, 1867 featured
lncRNAs have been curated and associated with diseases. This protocol demonstrates
the workflow for searching a specific lncRNA, browsing function and disease as-
sociation annotations, editing or updating the annotations based on published liter-
atures, adding a newly reported lncRNA, and downloading annotation or sequence
data. We use the lncRNA MALAT1 as an example of how one can search and use the
wiki.
Necessary Resources
Hardware
Computer with Internet connection
Ma et al.

2 of 19
Current Protocols in Bioinformatics
Software
Up-to-date Web browser, such as Firefox, Safari, or Internet Explorer
Search lncRNA in LncRNAWiki
Users can search lncRNA by transcript ID, symbol, or other keywords. In the homepage,
a global search in LncRNAWiki can be performed by entering the transcript ID or symbol,
keywords, or phrases in the search box and pressing “Enter” or by clicking the magnifier
icon in the search box. The related pages that include both the page title matches and
page text matches will be shown, or a page will appear with a message informing you
that no page has the keywords and phrases. If a page has the same title as the keyword,
press “Enter” to jump to that page directly.

1. Open LncRNAWiki (http://lncrna.big.ac.cn/index.php/Main_Page) on a Web


browser. As an example, type “MALAT1” in the search box, and press “Enter,”
which will lead to the results containing the keyword (Fig. 1). Alternatively, type a
phrase such as “breast cancer,” which will lead to the results page that displays all
breast cancer–associated lncRNAs and other related pages.
The search functionality operates on whole words, separated by spaces or other punctu-
ation marks. If users would like to search for an exact phrase, double quotes should be
used.

Browse functional annotations and genomic annotations


The content of every transcript in LncRNAWiki is structured into two parts: basic in-
formation and user-editable information. Basic information is not editable, while the
user-editable section allows users to add or delete annotations of experimentally vali-
dated or featured lncRNAs and allows users to format the text and add links, tables, and
figures.

2. Using “MALAT1” as an example, open the page titled “ENST00000534336.1”


(Fig. 2).

Figure 1 Search results of “MALAT1.” There is no page titled “MALAT1,” and all page text
matches are displayed. In LncRNAWiki, MALAT1 corresponds to the ID “ENST00000534336.1.” Ma et al.

3 of 19
Current Protocols in Bioinformatics
Figure 2 Screenshot of the annotation page of “MALAT1.” There are two parts: the user-editable part and the
Basic Information table. The user-editable part includes Annotated Information, Labs working on this lncRNA,
and References, which allow users to edit the information; the Basic Information table is not editable.

3. Browse the user-editable section, which appears at the top of the page.
The user-editable section stores various annotations as unstructured text, including
Annotated Information, Labs working on this lncRNA, and References.

4. Browse the Basic Information table, which mainly contains genomic annotations.
This table has ten subsections: Transcript ID, Source, Same with, Classification,
Length, Genomic location, Exon number, Exons, Genome context, and Sequence.
Edit or update the annotations of lncRNAs
The annotations in LncRNAWiki can be edited or annotated, which enables the content
to be kept correct and up to date.

5. Before editing/curating, log into your user account.


6. In the transcript page, click “Edit” at the top of the page to launch the curation page
(Fig. 3).
Basic Information cannot be edited. Other sections, including Annotated Information,
Labs working on this lncRNA, and References, are editable. Annotated Information
contains several subsections, including Name, Characteristics, Expression, Regulation,
Function, Disease, and Evolution, among others.

7. Users can add new subsections or delete irrelevant subsections.


8. Use wiki markup (http://lncrna.big.ac.cn/index.php/LncRNAWiki:FAQ#Editing_
Tips) to format the text or add links, tables, and images.
9. After editing, click “Show preview” to verify the edits are correct and are correctly
displayed. Finally, click “Save changes” to save all changes.
LncRNAWiki allows any user to view and search, but only registered users can add and
edit content. Using open identities provided by registrants not only improves content
reliability and increases collaboration and communication, but also rewards community-
Ma et al. curated efforts by giving explicit authorship.

4 of 19
Current Protocols in Bioinformatics
Figure 3 Screenshot of the curation page of “MALAT1.” Users should use wiki markup to format the text.

Add a new lncRNA


10. Before adding a lncRNA, first check whether this lncRNA is already included in
LncRNAWiki. To do so, search the ID/symbol in the search box and/or search the
sequence using BLAST.
11. If the lncRNA is not included in LncRNAWiki, log in and input the lncRNA
ID/symbol in the search box. You will then be asked if you would like to create a
specific page titled with the ID/symbol (Fig. 4).
12. Click the highlighted ID/symbol, and a new page will be created.
13. Input the text, and use wiki markup to format the text and add tables or
figures.
14. Click the preview button to verify text and formatting are correct, and then save all
edits.
To guarantee the quality of user submissions, the submitted lncRNA must be reported by
at least one published paper, and the annotation information must cite the corresponding
paper(s) in the proper manner in the References section. LncRNAWiki does not accept
new lncRNAs that lack support from published paper(s).
As an example, type “TEST-LncRNA” in the search box and press “Enter.” This keyword
does not exist in LncRNAWiki (Fig. 4). To create a new page and annotate related
information, click “TEST-LncRNA,” and add the related information in the curation
page. Then, click “Save page” to save the edits.

Download
15. Click on “Downloads” on the left side of the page.
Readme, data sources, basic information, sequence, and small protein are all freely
available. If you use the data, please cite the database and the related publications.

Ma et al.

5 of 19
Current Protocols in Bioinformatics
Figure 4 Create a page for a new lncRNA in LncRNAWiki. The keyword “TEST-LncRNA” does not exist in
LncRNAWiki. Clicking the highlighted “TEST-LncRNA” will create a new page titled “TEST-LncRNA.”

BASIC USING LncBook TO OBTAIN MULTI-OMICS INFORMATION AND DISEASE


PROTOCOL 2 ANNOTATIONS OF HUMAN LncRNAs
LncBook is an expert curation–based human lncRNA knowledgebase and is dedicated
to providing users with a comprehensive list of human lncRNAs, abundant multi-omics
data, and systematic annotations of functional mechanisms and disease associations.
This protocol demonstrates workflows for browsing both predicted and experimentally
validated lncRNAs; searching a specific lncRNA or a group of lncRNAs of interest;
browsing expression, methylation, variation, and interaction data; browsing functional
mechanisms; and browsing disease associations.
Necessary Resources
Hardware
Computer with Internet connection
Software
Up-to-date Web browser, such as Firefox, Safari, or Internet Explorer
Browse lncRNAs in LncBook
In the Transcripts pull-down list in the navigation menu, there are two sections, LncRNAs
and Featured LncRNAs, which archive all the integrated human lncRNA transcripts
Ma et al.

6 of 19
Current Protocols in Bioinformatics
Figure 5 Browse lncRNAs in LncBook. Click on the transcript or gene ID of interest in the table of lncRNAs
(A) to access the Transcript page (B) or Gene page (C), respectively, where there are multiple subsections
describing basic information of a lncRNA transcript or gene.

and all the functionally studied lncRNA transcripts, respectively. Alternatively, the two
sections can be accessed in the Resources section in the middle of the homepage. This
protocol demonstrates workflows for browsing lncRNAs by chromosome and GC content,
the content of a lncRNA transcript or gene page, and all the experimentally validated
lncRNAs.

1. Browse lncRNAs by chromosome and GC content. The LncRNAs section enables


users to browse lncRNA transcripts and their genes by chromosome, length, and GC
content (Fig. 5). Open LncBook (http://bigd.big.ac.cn/lncbook) on a Web browser,
and under Transcripts select “LncRNAs.” Select a chromosome from the Chromo-
some drop-down list, and set the range of GC content or length by dragging the
sliding window.
The search results will be displayed in a table. In addition to transcript IDs (prefixed with
HSALNT) and gene IDs (prefixed with HSALNG), basic information including genomic
location, classification category, length, GC content, open reading frame (ORF) length,
and exon number are also summarized in the table.
For example, choosing Chromosome: “Chromosome1,” Length (nt): “200 to 2660,” and
GC Content (%): “41 to 61” launches a new results table with a list of hyperlinked
transcripts that match the filter conditions (Fig. 5).

2. Browse lncRNA transcript. LncBook manages human lncRNAs based on transcripts.


Click the transcript ID to access the Transcript page, where users can browse
the related information, including basic information (symbol, gene ID, genomic
context, length, exon number, GC content, classification, sequence, longest ORF
length, coding potential), multi-omics data (expression, methylation, genome vari-
ation, lncRNA-miRNA interaction), function annotations, and disease associations
(Fig. 5).
Ma et al.

7 of 19
Current Protocols in Bioinformatics
Figure 6 Browse featured lncRNAs. Featured LncRNAs allows users to browse the functionally curated
lncRNAs by gene symbol, transcript ID, gene ID, or synonyms. Basic information of each featured lncRNA is
summarized in the table.

3. Browse lncRNA gene. Click the gene ID to access the Gene page, where users
can browse gene-related information, including basic information (genomic context
and length), transcripts, and multi-omics data (expression, methylation, genome
variation) as shown in Figure 5. The reciprocal links of transcripts are also available
in the Gene page.
4. Browse featured LncRNAs. The Featured LncRNAs section enables users to browse
all the functionally curated lncRNAs by gene symbol, transcript ID, gene ID, or syn-
onyms. For the featured lncRNAs, the primary information—including functional
mechanism, biological process, disease, MeSH ontology, and PMID—is summa-
rized in the results table (Fig. 6).

a. Click on hyperlinks within the Transcript ID or Gene ID columns to browse the


Transcript or Gene page, respectively.
b. Click on the hyperlinks within the Symbol column to browse the annotation
information in LncRNAWiki.
c. Click on hyperlinks within the Functional Mechanism and MeSH Ontology
columns to access search results of these controlled vocabularies. The hyperlinks
within the PMID column will direct users to the publication in PubMed.
Search lncRNAs in LncBook
LncBook supports searches for multiple types of keywords (ignoring letter case) such as
gene symbol, gene ID, transcript ID, GENCODE ID, disease name, function type, and
classification category.

5. In the homepage, type or copy a keyword into the search box in the middle of the
page, and then click “Search” on the right of the search box to perform a keyword
search (Fig. 7). This will lead to a results page that displays a list of lncRNAs that
match the search.
For example, entering “breast cancer” in the search box and then clicking the search
button will lead to a results page containing all lncRNAs that are associated with breast
cancer. The search results for breast cancer by default only show items including Tran-
script ID, Gene ID, Symbol, Classification, Biological Processes, and Disease. For a
personalized view, click the pull-down menu in the top right corner of the search results
to add or remove items (Fig. 7).
Ma et al.

8 of 19
Current Protocols in Bioinformatics
Figure 7 Basic search in LncBook. The example shows the output of “breast cancer.” For a personalized
view, click the pull-down box to add or remove items.

6. LncBook also allows for an individual feature search in different sections; it may
be convenient to search a lncRNA or a specific group of lncRNAs in the resources,
including Featured LncRNAs, Disease, Function, Methylation, and Variation.
Browse multi-omics data
In the Multi-Omics pull-down list in the navigation menu or Resources section in the
middle of homepage, hyperlinks provide access to basic browsing of multi-omics data
including expression, methylation, variation, and interaction (Fig. 8).

7. Expression provides tissue expression profiles of lncRNAs based on two sets of


RNA-seq data: the Human Protein Atlas (HPA; 32 normal human tissues covered;
Uhlen et al., 2015) and Genotype Tissue Expression (GTEx; 53 normal human
tissues covered; The GTEx Consortium, 2015). Expression allows specific browsing
by setting data sets, tau values (tissue specificity index; Yanai et al., 2005), expression
breadth, and coefficient of variance (CV).

a. Click “Expression” to browse expression-related parameters that are summarized


in a table that contains columns for Maximum value, Average value, Median
value, CV, tau value, and Expression Breadth across different tissues.
For example, choosing the filter options Data Sets: “GTEx (Genothpe-Tissue Expression,
53 tissues),” Tau Value: “ࣙ0.9 and ࣘ1.0,” Expression Breadth: “ࣙ4 and ࣘ50,” and
CV: “ࣙ4 and ࣘ7” and then clicking “Filter” leads to the result table shown in Figure 9. Ma et al.

9 of 19
Current Protocols in Bioinformatics
Figure 8 Browse multi-omics data. Multi-omics data can be accessed by clicking items in the pull-down list
of the navigation menu or Resources section in the middle of homepage. Multi-omics data covers expression,
methylation, variation, and interaction.

Figure 9 Browse lncRNA expression profiles across normal human tissues: (A) filter options, (B) lncRNAs
that match the filter options in (A), (C) expression profile of the lncRNA “HSALNT0001588,” and (D) criteria for
tissue-specific or housekeeping lncRNAs.

b. Click the button in the Chart column to view expression levels of the transcript
in different tissues in a box plot (Fig. 9).
c. Click the hyperlinks in the left corner of the expression page to view results
pages containing tissue-specific lncRNAs and housekeeping lncRNAs, which are
Ma et al.
characterized with specific parameters (Fig. 9).

10 of 19
Current Protocols in Bioinformatics
Figure 10 Browse methylation profiles across different cancers. Methylation levels of promoters and body
regions can be viewed in dot plots. Methylation allows specific browsing by gene symbol, synonyms, and
transcript ID.

8. Methylation archives DNA methylation levels of both cancer and normal sam-
ples in nine different types of cancer (bladder urothelial carcinoma, glioblastoma
multiforme, lung squamous cell carcinoma, lung adenocarcinoma, breast invasive
carcinoma, colon adenocarcinoma, rectum adenocarcinoma, stomach adenocarci-
noma, uterine corpus endometrial carcinoma). Methylation level is calculated based
on whole genome bisulfite sequencing data.
a. Click “Methylation” to browse the average DNA methylation level of promoter
regions in nine types of cancer (Fig. 10).
b. Click on the chart hyperlinks to view dot plots of methylation levels of the
promoter and body regions in cancer and normal samples (Fig. 10).
The Methylation section also allows specific browsing by gene symbol, synonyms, and
transcript ID (Fig. 10).
9. Variation provides information on single-nucleotide polymorphisms (SNPs) that are
mapped to lncRNA loci in the SNP database (dbSNP; Sherry et al., 2001). For
each SNP, minor allele frequency (MAF) values are annotated based on the 1000
Genomes Project (Abecasis et al., 2010), and pathogenic information is obtained
from ClinVar (Landrum et al., 2016) and COSMIC (Forbes et al., 2017).
a. Click “Variation” to browse SNPs in lncRNA loci by chromosome, posi-
tive/negative strand, genomic location, and dbSNP ID.
For example, choosing the filter options “Chromosome2,” “+,” and genomic region
between “1,000” and “100,000” leads to the results table shown in Figure 11.
b. View lncRNA SNP-related information listed in the summary table. For each
SNP, lncRNA transcript ID, basic information, MAF values obtained from the
1000 Genomes Project, and pathogenic information in ClinVar and COSMIC are
displayed (Fig. 11).
Clicking the dbSNP ID provides access to the dbSNP page, where detailed information
Ma et al.
is available.
11 of 19
Current Protocols in Bioinformatics
Figure 11 Browse variation data. The results table shows SNPs that match the filter options.

Figure 12 Browse lncRNA-miRNA interactions. Interaction allows users to browse lncRNA-miRNA interac-
tions by gene symbol, miRNA ID, synonyms, transcript ID, and experimental evidence.

10. Interaction shows lncRNA-miRNA interactions predicted using TargetScan (Lewis,


Burge, & Bartel, 2005) and miRanda (Betel, Wilson, Gabow, Marks, & Sander,
2008) and experimentally validated interactions in StarBase (Li, Liu, Zhou, Qu,
& Yang, 2014). Click “Interaction” to view the summary table that contains the
columns Transcript ID, MiRNA ID, Score, Energy, Binding Start, Binding End, and
Experimental Evidence (Fig. 12). Interaction also allows users to browse lncRNA-
miRNA interactions by gene symbol, miRNA ID, synonyms, transcript ID, and
experimental evidence (Fig. 12).
Browse function annotations of featured lncRNAs
The current version of LncBook contains 1653 functional lncRNAs and 3762 lncRNA-
Ma et al.
function associations curated based on 2501 publications.
12 of 19
Current Protocols in Bioinformatics
Figure 13 Browse lncRNA-function associations. Function allows specific searches by different items. Al-
ternatively, the lncRNAs associated with specific functional mechanisms, biological processes, or tags are
available by clicking on the hyperlink.

11. In the LncBook homepage (http://bigd.big.ac.cn/lncbook), click “Function” in the


navigation menu or in the Resources section to access the Function section, where
lncRNA-function associations can be browsed and searched. Detailed information
is summarized in the table (Fig. 13).
12. Click the link in the Symbol column to access the annotation page in LncRNAWiki,
where detailed functional mechanisms can be accessed.
The Function section allows specific searches by transcript ID, gene symbol, synonyms,
functional mechanism, biological process, and keywords (Fig. 13).
Similarly, on the top of the Function page, hyperlinks provide access to a group of
lncRNAs related through specific functional mechanisms (transcriptional regulation,
ceRNA, splicing regulation, translational control, protein localization, RNAi), biological
processes (pathogenic and developmental process), or tags (biomarker, therapy target,
SNP in lncRNA, oncogene, suppressor, stress action, miRNA precursor, evolution) as
depicted in Figure 13.

Browse disease associations of featured lncRNAs and predicted disease-associated


lncRNAs
The current version of LncBook provides 3,772 experimentally validated lncRNA-disease
associations, as well as 97,998 lncRNAs that are putatively associated with diseases, and
these annotations and predictions are archived in Validated and Predicted, respectively.

13. In the homepage (http://bigd.big.ac.cn/lncbook), click “Diseases” in the navigation


menu or in the Resources section to access the Diseases section, where the lncRNA-
disease associations can be browsed and searched.
14. Click “Validated” to access experimentally validated disease-associated lncRNAs.
Disease name, MeSH ontology, dysfunction type, and PMID of related publication
are provided in the summary table (Fig. 14). Disease association information is
curated based on lncRNA-disease associations in LncRNADisease (Chen et al.,
2013) and LncRNAWiki.
Ma et al.

13 of 19
Current Protocols in Bioinformatics
Figure 14 Browse validated lncRNA-disease associations. Validated allows specific searches by different
items. Alternatively, the lncRNAs associated with specific MeSH ontology terms are available by clicking on
the hyperlink.

15. Selecting “Disease” or “MeSH Ontology” in the search box pull-down menu pro-
vides the results of a group of lncRNAs related with specific diseases (Fig. 14).
Similarly, the MeSH Ontology hyperlink at the top of the page provides access to
the specific list of lncRNAs (Fig. 14).
16. Click the gene symbol to access the annotation page in LncRNAWiki.
17. Click “Predicted” toward the top of the page to access the predicted disease-
associated lncRNAs (Fig. 15), which are obtained based on the pathogenic evidence
from methylation, variation, and interaction.
18. View supportive evidence listed in the summary table (Fig. 15). Supportive evidence
is indicated with a checkmark, and classification category and expression pattern
are also provided.
19. Search a specific predicted disease-associated lncRNA or a group of such lncRNAs
by selecting Transcript ID or Classification in the pull-down search menu (Fig. 15).
20. Click “Methylation Change,” “Pathogenic Variation,” or “All Grades” to browse
a group of lncRNAs supported by methylation evidence, pathogenic variation evi-
dence, or one or multiple evidence, respectively (Fig. 15).
Download
21. Click “Download” in the navigation menu to access the Download page.
LncBook is an open access database distributed under the terms of the Creative Commons
Attribution Noncommercial License, which permits unrestricted noncommercial use, dis-
tribution, and reproduction in any medium, provided the original work is properly cited.
In the Download page, users can freely download various data including lncRNA se-
quence in FASTA format; annotation files in GTF and GFF formats; lists of featured
Ma et al. lncRNAs; and function, disease, expression, methylation, variation, and interaction data.
14 of 19
Current Protocols in Bioinformatics
Figure 15 Browse predicted disease-associated lncRNAs. Predicted allows specific searches by differ-
ent items including transcript ID and classification. In addition, lncRNAs associated with different kinds of
pathogenic evidence are available by clicking on the hyperlinks in Methylation Change, Pathogenic Variation,
and All Grades.

COMMENTARY
Background Information traced back to their genomic locations and thus
Human lncRNAs in LncRNAWiki and are not included.
LncBook were integrated from different To date LncBook contains a large collec-
sources. BLAST alignment (Altschul, Gish, tion of 247,246 existing lncRNAs. Also, novel
Miller, Myers, & Lipman, 1990) or Cuffcom- lncRNAs were identified based on the 122
pare comparison (Trapnell et al., 2012) was RNA-seq data from HPA, and 21,815 novel
used to identify redundant lncRNAs. However, lncRNAs were identified. Finally, we obtained
LncBook used the updated data of existing a total number of 270,044 lncRNAs.
databases and also identified novel lncRNAs
based on RNA-seq data in HPA.
LncBook provides the most comprehensive Future Directions
list of human lncRNAs to date. Specifically, In the future, LncBook will try to improve
we collected existing human lncRNAs from data quality by using more strict standards
GENCODE version 27 (Derrien et al., 2012), and integrating high-quality annotations. We
NONCODE version 5.0 (Fang et al., 2018), plan to integrate full-length lncRNAs from
LNCipedia version 4.1 (Volders et al., 2019), additional databases such as FANTOM CAT
and MiTranscriptome beta (Iyer et al., 2015). (Hon et al., 2017) and BIGTranscriptome
To obtain high-confidence lncRNAs in (You, Yoon, & Nam, 2017). Also, we will
LncBook, a set of strict criteria was maintain regular integration of newly discov-
adopted by considering redundancy, back- ered lncRNAs to obtain a comprehensive list
ground noise, mapping error, incomplete tran- of human lncRNAs. Future developments of
script, length, and coding potential. On the LncBook also include incorporation of other
other hand, we integrated the experimentally omics data, such as RNA N6 -methyladenosine
validated lncRNAs, which were sourced from and RNA 5-methylcytosine modifications,
LncRNAWiki. The RefSeq and Ensembl ref- and identification of differentially expressed
erences were obtained from the HUGO Gene lncRNAs in normal and disease samples.
Nomenclature Committee (HGNC) to enable This would provide users with more in-
genomic location comparison. However, half formation to find important functional
of these lncRNAs are presently unable to be lncRNAs.
Ma et al.

15 of 19
Current Protocols in Bioinformatics
Figure 16 Tools for online analysis. Tools can be accessed by clicking items in the pull-down list of the
navigation menu or by clicking “Tools” in the homepage.

Figure 17 BLAST results of the query sequence “ENST00000469225.1_1” against LncBook lncRNAs.

Suggestions for Further Analysis and pathogenic information linked to genome


This system is dedicated to searching, variation, users could identify lncRNAs with
browsing, and annotating functional lncRNAs potential important functions in tissue de-
and browsing multi-omics data of the large velopment or disease. Additionally, LncBook
number of predicted lncRNAs. Based on the incorporates a series of tools including
abundant multi-omics data in various samples, BLAST, coding potential prediction, genomic
Ma et al. including different human tissues and cancers positional annotation, and ID conversion

16 of 19
Current Protocols in Bioinformatics
Figure 18 Coding potential prediction. (A) LGC main page. Clicking “Example Sequence” and “Run” will
jump to the waiting page (B). (C) Prediction results.

Figure 19 Classifying a lncRNA based on its positional relationship with a protein-coding gene. In this
example, clicking “Example” and “Run” shows that this is an intergenic lncRNA.

(Fig. 16), which would help users perform on- human lncRNAs in LncBook (Fig. 17). For
line analysis. users who would like to test if a certain
lncRNA has been included in LncBook or
BLAST alignment to try to find human lncRNA homologs,
BLAST allows sequence similarity BLAST could be used to perform their
searches against the comprehensive list of analyses. Ma et al.

17 of 19
Current Protocols in Bioinformatics
Figure 20 LncRNA ID conversion. Clicking “Example IDs” and “Run” will display IDs in other databases.

Coding potential prediction could upload lncRNAs of a certain database


LncBook provides coding potential predic- and obtain the list of lncRNA IDs in other
tion analysis using our in-house software LGC databases.
(Fig. 18), which distinguishes lncRNAs from
mRNAs based on the relationship between
Acknowledgments
ORF Length and GC content (Wang et al., This work is supported by the Strate-
2019). LGC can be used in a cross-species gic Priority Research Program of the Chi-
manner without species-specific adjustments, nese Academy of Sciences (XDA19050302,
and it is robustly effective across species that XDB13040500), the National Key Research
range from plants to mammals (Wang et al., and Development Program of China (2017
2019). YFC0907502, 2015AA020108), the 13th
For users who would like to detect novel Five-Year Informatization Plan of the Chi-
lncRNAs or to examine if a certain lncRNA nese Academy of Sciences (XXH13505-05),
predicted using other algorithms can also the International Partnership Program of the
be identified as a lncRNA using a different Chinese Academy of Sciences (153F11KYSB
method, LGC could be used to predict coding 20160008), and the National Natural Science
potential. Foundation of China (31871328).

Literature Cited
Classification Abecasis, G. R., Altshuler, D., Auton, A., Brooks,
Classification is also an in-house tool which L. D., Durbin, R. M., Gibbs, R. A., . . . McVean,
was developed to classify a lncRNA transcript G. A. (2010). A map of human genome variation
based on its positional relationship with a from population-scale sequencing. Nature, 467,
protein-coding gene (Fig. 19). Users could use 1061–1073. doi: 10.1038/nature09534.
this tool to annotate the relative genomic loca- Altschul, S. F., Gish, W., Miller, W., Myers, E.
tion of their own lncRNAs. W., & Lipman, D. J. (1990). Basic local align-
ment search tool. Journal of Molecular Biology,
215, 403–410. doi: 10.1016/S0022-2836(05)
80360-2.
LncRNA ID conversion
LncBook integrates a comprehensive list Betel, D., Wilson, M., Gabow, A., Marks, D.
S., & Sander, C. (2008). The microRNA.org
of human lncRNAs from existing databases, resource: Targets and expression. Nucleic
and this conversion tool allows transcript Acids Research, 36, D149–153. doi: 10.1093/
ID conversion among six different databases: nar/gkm995.
LncBook, GENCODE, RefSeq, NONCODE, BIG Data Center Members. (2017). The
Ma et al.
LNCipedia, MiTranscriptome (Fig. 20). Users BIG Data Center: From deposition to
18 of 19
Current Protocols in Bioinformatics
integration to translation. Nucleic Acids Re- Ma, L., Cao, J., Liu, L., Du, Q., Li, Z., Zou, D., . . .
search, 45, D18–D24. doi: 10.1093/nar/ Zhang, Z. (2019). LncBook: A curated knowl-
gkw1060. edgebase of human long non-coding RNAs.
BIG Data Center Members. (2018). Database re- Nucleic Acids Research, 47, D128–D134. doi:
sources of the BIG Data Center in 2018. 10.1093/nar/gky960.
Nucleic Acids Research, 46, D14–D20. doi: Ma, L., Li, A., Zou, D., Xu, X., Xia, L., Yu,
10.1093/nar/gkx897. J., . . . Zhang, Z. (2015). LncRNAWiki: Har-
BIG Data Center Members. (2019). Database Re- nessing community knowledge in collabora-
sources of the BIG Data Center in 2019. tive curation of human long non-coding RNAs.
Nucleic Acids Research, 47, D8–D14. doi: Nucleic Acids Research, 43, D187–192. doi:
10.1093/nar/gky993. 10.1093/nar/gku1167.
Chen, G., Wang, Z., Wang, D., Qiu, C., Liu, M., Sherry, S. T., Ward, M. H., Kholodov, M., Baker,
Chen, X., . . . Cui, Q. (2013). LncRNADis- J., Phan, L., Smigielski, E. M., & Sirotkin, K.
ease: A database for long-non-coding RNA- (2001). dbSNP: The NCBI database of genetic
associated diseases. Nucleic Acids Research, 41, variation. Nucleic Acids Research, 29, 308–311.
D983–986. doi: 10.1093/nar/gks1099. doi: 10.1093/nar/29.1.308.
Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., The GTEx Consortium. (2015). Human genomics.
Djebali, S., Tilgner, H., . . . Guigo, R. (2012). The Genotype-Tissue Expression (GTEx) pi-
The GENCODE v7 catalog of human long non- lot analysis: Multitissue gene regulation in hu-
coding RNAs: Analysis of their gene structure, mans. Science, 348, 648–660. doi: 10.1126/
evolution, and expression. Genome Research, science.1262110.
22, 1775–1789. doi: 10.1101/gr.132159.111. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim,
Fang, S., Zhang, L., Guo, J., Niu, Y., Wu, Y., D., Kelley, D. R., . . . Pachter, L. (2012). Dif-
Li, H., . . . Zhao, Y. (2018). NONCODEV5: ferential gene and transcript expression analy-
A comprehensive annotation database for long sis of RNA-seq experiments with TopHat and
non-coding RNAs. Nucleic Acids Research, 46, Cufflinks. Nature Protocols, 7, 562–578. doi:
D308–D314. doi: 10.1093/nar/gkx1107. 10.1038/nprot.2012.016.
Forbes, S. A., Beare, D., Boutselakis, H., Bam- Uhlen, M., Fagerberg, L., Hallstrom, B. M., Lind-
ford, S., Bindal, N., Tate, J., . . . Campbell, P. skog, C., Oksvold, P., Mardinoglu, A., . . . Pon-
J. (2017). COSMIC: Somatic cancer genetics ten, F. (2015). Proteomics. Tissue-based map of
at high-resolution. Nucleic Acids Research, 45, the human proteome. Science, 347, 1260419.
D777–D783. doi: 10.1093/nar/gkw1121. doi: 10.1126/science.1260419.
Hon, C. C., Ramilowski, J. A., Harshbarger, J., Volders, P. J., Anckaert, J., Verheggen, K., Nuytens,
Bertin, N., Rackham, O. J., Gough, J., . . . For- J., Martens, L., Mestdagh, P., & Vandesom-
rest, A. R. (2017). An atlas of human long non- pele, J. (2019). LNCipedia 5: Towards a ref-
coding RNAs with accurate 5 ends. Nature, 543, erence set of human long non-coding RNAs.
199–204. doi: 10.1038/nature21374. Nucleic Acids Research, 47, D135–D139. doi:
10.1093/nar/gky1031.
Iyer, M. K., Niknafs, Y. S., Malik, R., Singhal, U.,
Sahu, A., Hosono, Y., . . . Chinnaiyan, A. M. Volders, P. J., Verheggen, K., Menschaert, G., Van-
(2015). The landscape of long noncoding RNAs depoele, K., Martens, L., Vandesompele, J., &
in the human transcriptome. Nature Genetics, Mestdagh, P. (2015). An update on LNCipedia:
47, 199–208. doi: 10.1038/ng.3192. A database for annotated human lncRNA se-
quences. Nucleic Acids Research, 43, D174–
Landrum, M. J., Lee, J. M., Benson, M., Brown, D180. doi: 10.1093/nar/gku1060.
G., Chao, C., Chitipiralla, S., . . . Maglott, D.
R. (2016). ClinVar: Public archive of interpre- Wang, G., Yin, H., Li, B., Yu, C., Wang, F., Xu, X.,
tations of clinically relevant variants. Nucleic . . . Zhang, Z. (2019). Characterization and iden-
Acids Research, 44, D862–868. doi: 10.1093/ tification of long non-coding RNAs based on
nar/gkv1222. feature relationship. Bioinformatics, Epub ahead
of print. doi: 10.1093/bioinformatics/btz008.
Lewis, B. P., Burge, C. B., & Bartel, D. P.
(2005). Conserved seed pairing, often flanked by Yanai, I., Benjamin, H., Shmoish, M., Chalifa-
adenosines, indicates that thousands of human Caspi, V., Shklar, M., Ophir, R., . . . Shmueli,
genes are microRNA targets. Cell, 120, 15–20. O. (2005). Genome-wide midrange transcription
doi: 10.1016/j.cell.2004.12.035. profiles reveal expression level relationships in
human tissue specification. Bioinformatics, 21,
Li, J. H., Liu, S., Zhou, H., Qu, L. H., & Yang,
650–659. doi: 10.1093/bioinformatics/bti042.
J. H. (2014). starBase v2.0: Decoding miRNA-
ceRNA, miRNA-ncRNA and protein-RNA in- You, B. H., Yoon, S. H., & Nam, J. W. (2017). High-
teraction networks from large-scale CLIP-Seq confidence coding and noncoding transcriptome
data. Nucleic Acids Research, 42, D92–97. doi: maps. Genome Research, 27, 1050–1062. doi:
10.1093/nar/gkt1248. 10.1101/gr.214288.116.

Ma et al.

19 of 19
Current Protocols in Bioinformatics

You might also like