Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27

Fig 1 : Home page of NCBI

MEDLINE is the U.S. National Library of Medicine® (NLM) premier bibliographic


database that contains more than 24 million references to journal articles in life sciences
with a concentration on biomedicine. A distinctive feature of MEDLINE is that the records
are indexed with NLM Medical Subject Headings. MEDLINE is the online counterpart to
MEDLARS® (MEDical Literature Analysis and Retrieval System) that originated in 1964.
The great majority of journals are selected for MEDLINE based on the recommendation
of the Literature Selection Technical Review Committee (LSTRC), an NIH-chartered
advisory committee of external experts analogous to the committees that review NIH
grant applications. Some additional journals and newsletters are selected based on
NLM-initiated reviews, e.g., history of medicine, health services research, AIDS,
toxicology and environmental health, molecular biology, and complementary medicine,
that are special priorities for NLM or other NIH components. These reviews generally
also involve consultation with an array of NIH and outside experts or, in some cases,
external organizations with which NLM has or had special collaborative arrangements.
MEDLINE is the primary component of PubMed®, part of the Entrez series of databases
provided by the NLM National Center for Biotechnology Information (NCBI).
MEDLINE includes literature published from 1966 to present, and selected coverage of
literature prior to that period.
A growing number of MEDLINE citations contain a link to the free full text of the article
archived in PubMed Central® or to other sites. You can also link from many MEDLINE
references to the Web site of the publisher or other full text provider to request or view
the full article, depending upon the publisher's access requirements. For articles not
freely available on the Web, the "Loansome Doc®" feature in PubMed provides an easy
way to place an electronic order through the National Network of Libraries of
Medicine® (NN/LM®) for the full-text copy of an article cited in MEDLINE. Registration is
required and local fees may apply for this service.
Access to MEDLINE data are also available via services and products developed by
organizations that download the database from NLM. Access to various MEDLINE
services is often available from medical libraries, many public libraries, and commercial
sources.
MedlinePlus®, another service offered by the NLM, provides consumer-oriented health
information. Health consumers are encouraged to discuss search results with their
health care provider.
The GenBank sequence database is an open access, annotated collection of all publicly
available nucleotide sequences and their protein translations. This database is produced and maintained by
the National Center for Biotechnology Information (NCBI) as part of the International Nucleotide Sequence Database
Collaboration (INSDC). The National Center for Biotechnology Information is a part of the National Institutes of Health in
the United States.

History

Walter Goad of the Theoretical Biology and Biophysics Group at Los Alamos National Laboratory and others
established the Los Alamos Sequence Database in 1979, which culminated in 1982 with the creation of the public
GenBank.[5] Funding was provided by the National Institutes of Health, the National Science Foundation, the Department
of Energy, and the Department of Defense. LANL collaborated on GenBank with the firm Bolt, Beranek, and Newman,
and by the end of 1983 more than 2,000 sequences were stored in it.
In the mid 1980s, the Intelligenetics bioinformatics company at Stanford University managed the GenBank project in
collaboration with LANL.[6] As one of the earliest bioinformatics community projects on the Internet, the GenBank project
started BIOSCI/Bionet news groups for promoting open access communications among bioscientists. During 1989 to
1992, the GenBank project transitioned to the newly created National Center for Biotechnology Information
Sequencing
Only original sequences can be submitted to GenBank. Direct submissions are made to GenBank using BankIt, which is
a Web-based form, or the stand-alone submission program, Sequin. Upon receipt of a sequence submission, the
GenBank staff examines the originality of the data and assigns an accession number to the sequence and performs
quality assurance checks. The submissions are then released to the public database, where the entries are retrievable
by Entrez or downloadable by FTP. Bulk submissions of Expressed Sequence Tag (EST), Sequence-tagged
site (STS), Genome Survey Sequence (GSS), and High-Throughput Genome Sequence (HTGS) data are most often
submitted by large-scale sequencing centers. The GenBank direct submissions group also processes complete
microbial genome sequences.
UniGene

Unigene : url:

The UniGene System

UniGene is an experimental system for automatically partitioning


GenBank sequences into a non-redundant set of gene-oriented
clusters. Each UniGene cluster contains sequences that represent a
unique gene, as well as related information such as the tissue types in
which the gene has been expressed and map location.

In addition to sequences of well-characterized genes, hundreds of


thousands novel expressed sequence tag (EST) sequences have been
included. Consequently, the collection may be of use to the
community as a resource for gene discovery. UniGene has also been
used by experimentalists to select reagents for gene mapping projects
and large-scale expression analysis.

However, it should be noted that the procedures for automated


sequence clustering are still under development and the results may
change from time to time as improvements are made. Feedback from
users has been especially useful in identifying problems and we
encourage you to report any problems you encounter.

It should also be noted that no attempt has been made to produce


contigs or consensus sequences. There are several reasons why the
sequences of a set may not actually form a single contig. For
example, all of the splicing variants for a gene are put into the same
set. Moreover, EST-containing sets often contain 5' and 3' reads from
the same cDNA clone, but these sequences do not always overlap.

Currently, sequences from the animals human, rat, mouse, cow,


zebrafish and clawed frog have been processed. Plant organisms are
wheat, rice, barley, maize and cress. These species were chosen
because they have the greatest amounts of EST data available and
represent a variety of species. Additional organisms may be added in
the future.

A representation of the UniGene datasets is available by ftp.

A description of the UniGene build procedure is available.


Fig 1:
Protein Database
A database that includes protein sequence records from a variety of sources, including GenPept, RefSeq, Swiss-
Prot, PIR, PRF, and PDB.

Fig 1: Query of protein

Fig 2 : output: for myosin protein in homo sapiens


Fig 3: Flat file format for myosin

Fig 4: FASTA SEQUENCE FOR MYOSIN PROTEIN


Introduction – Swiss-Prot

Swiss-Prot is a manually-curated biological database of protein sequences started by Amos


Bairoch and developed by the Swiss Institute of Bioinformatics and the European
Bioinformatics Institute. Swiss-Prot strives to provide reliable protein sequences associated
with a high level of annotation, including each of the following:

 its domains structure,


 post-translational modifications,
 variants,
 a minimal level of redundancy,
 a high level of integration with other databases, etc.

Swiss-Prot and its automatically-curated supplement TrEMBL, have joined with the Protein
Information Resource protein database to produce the UniProt Knowledgebase, the world's
most comprehensive catalogue of information on proteins. The UniProt consortium produced
3 database components, each optimised for different uses:

 The UniProt Knowledgebase (UniProtKB (Swiss-Prot + TrEMBL));


 The UniProt Non-redundant Reference (UniRef) databases, which combine closely
related sequences into a single record to speed similarity searches; and
 The UniProt Archive (UniParc), which is a comprehensive repository of protein
sequences, reflecting the history of all protein sequences.

As of 11 July 2012, release "2012_07" of UniProtKB/Swiss-Prot contains 536,789 sequence


entries (comprising 190,518,892 amino acids abstracted from 211,308 references) and release
"2012_06" of UniProtKB/TrEMBL contains 226,60,469 sequence entries (comprising
7,407,531,063 amino acids).

You might also like