Bio PPT

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 35

INTRODUCTION OF BIOINFORMATICS:

 The term Bioinformatics is used to encompass almost


all computer applications in biological sciences , but was
originally coined in the mid 1980s for the analysis of
biological sequence data.
Bioinformatics describes any use of computers to
handle biological information.
In practice,the definition used by most people is
narrower, bioinformatics to them is a synonym for
“computational molecular biology”.
Most biologists talk about “doing bioinformatics”
when they use computers to store,retrieve,analyze or
predict the composition or the structure of biomolecules.
Bioinformatics = Biology + Information

Roughly, bioinformatics describes any use of


computers to handle biological information.

 Computational methods are necessary to analyze the


Massive amount of information that are coming out of the
genome projects.
-Bioinformatics is not just using a computer to store
data, or speed up biology…

With bioinformatics, you do biological

hypothesis testing on a computer…


Easy Answer –
Using computers to solve
molecular biology problems.

Hard Answer-
Computational techniques for
management and analysis of
biological data and knowledge.
Why Learn Bioinformatics?
Bioinformatics provides a set of tools that is essential
in modern biomedical research (or other research
using biological molecules).

And also ….
Basic Science Motivation

Bioinformatics will provide new insights


into biological function

Bioinformatics techniques have reinforced that

• Simple organisms are models for more complex systems


• DNA & Protein similarities illustrate evolutionary history
• Gene variation in populations are important for drug discovery
• Genetic data has important personal and social implications
• Noncoding DNA is important for gene control
Bioinformatics is mainly used in:

-Sequence alignment.
-Protien structure alignment.
-Protien structure prediction.
-Prediction of gene expression.
-Protein-protein interactions.
-Finding evolutionary relationship.
Primary sequence databases :
 
EMBL (European Molecular Biology Laboratory nucleotide
sequence database at EBI, Hinxton, UK)
GenBank (at National Center for Biotechnology information,
NCBI, Bethesda, MD, USA)
DDBJ (DNA Data Bank Japan at CIB , Mishima, Japan).

3D structure databases :
 
PDB (Protein Data Bank cured by RCSB, USA)
EBI-MSD (Macromolecular Structure Database at EBI, UK )
NDB (Nucleic Acid structure Datatabase at Rutgers State
University of New Jersey , USA)
Protein sequence databases :

- SWISS-PROT (Swiss Institute of Bioinformatics, SIB,


Geneva, CH)
TrEMBL (=Translated EMBL: computer annotated protein
sequence database at EBI, UK)
PIR-PSD (PIR-International Protein Sequence Database,
annotated protein database by PIR, MIPS and JIPID at NBRF,
Georgetown University, USA)
UniProt (Joined data from Swiss-Prot, TrEMBL and PIR)
UniRef (UniProt NREF (Non-redundant REFerence) database
at EBI, UK)
IPI (International Protein Index; human, rat and mouse proteome
database at EBI, UK)
Protein classification databases:
CluSTr (Clusters of SWISS-PROT and TrEMBL proteins at
EBI, UK) .

Pfam (Protein families database of alignments and HMMs at the


Sanger Centre, UK) .

SCOP (Structural Classification of Proteins according to familiy,


superfamily, common fold, and class) .
CATH (Protein structure classification based on Class,
Architecture, Topology, and Homologous superfamilies) .
Dali (Dali Fold classification based on structure-structure
alignment of proteins, at Helsinki University, Finland) .
Sequence similarity search database:

BLAST (Basic local alignment search tool at NCBI, USA)


FASTA (Fasta or fastx search at EBI, UK)

Structure silmilarity database:

VAST (Structure-structure similarity search at NCBI, US)


DALI (Protein structure comparison with PDB at EBI,UK)
Introduction – a short history of sequence
databases:
•1960s:MargaretDayhoff (PIR,Protein
Information Resource) collected all known
protein sequences
• 1971: PDB, Protein Data Bank established: 7
structures!
• 1972: Distribution of PIR data on magnetic
tape
• 1978: “Atlas of protein sequence and
structure” published
• 1982: EMBL, European Molecular Biology
Laboratory & GenBank, NCBI @ NIH
• 1986: Swiss-Prot distributed on US BIONET
(3900 seq.)
• 1987: DDBJ (DNA DataBank of Japan)
released
• 1988: International Nucleotide Sequence
Database
Collaboration: EMBL GenBank DDBJ
InternationalNucleotide Sequence
Database Collaboration:
(1) NCBI – Overview :

NCBI - National Center for Biotechnology


Information…
-Established in 1988 as a national resource for
molecular biology information, NCBI creates
public databases, conducts research in
computational biology, develops software tools
for analyzing genome data, and disseminates
biomedical information - all for the better
understanding of molecular processes affecting
human health and disease.
 NCBI provides data retrieval systems and computational
resources for the analysis of data in GenBank and other
biological data made available through NCBI’s website.
 DATABASE RETRIEVAL TOOLS:
 Entrez-
-Entrez is an integrated database retrieval system that
enables text searching, using simple queries, of a
diverse set of over 20 databases, several added during the past
year.
-A newly implemented Global Query, the default search on
the NCBI homepage, now allows simultaneous searches
across all the Entrez databases at speeds comparable to a
single database search.
-The Entrez databases include DNA and protein sequences
derived from several sources, the NCBI taxonomy, genomes,
population sets, gene expression data, sequence-tagged sites
in UniSTS, genetic variations in dbSNP, protein structures
from the Molecular Modeling Database (MMDB) , three
dimensional (3D) and alignment-based protein domains.
-It provides users within integrated access to sequence ,
mapping ,taxonomy & structural data.
-It also provides graphical views of sequences &
chromosome maps.
(2)EMBL - European Molecular Biology  Laboratory.
-It is located & maintain at EBI(European Bioinformatics
Institute).
-It offers a comprehensive set of publicly available
nucleotide sequence and annotation, freely accessible to all.
-The International Nucleotide Sequence Database
Collaboration (INSDC) comprises the EMBL Nucleotide
Sequence Database at EMBL-EBI, The DNA Databank of
Japan and GenBank in the USA.
-A key goal of the EMBL Nucleotide Sequence Database is
to integrate nucleotide sequence and annotation into the
bioinformatics resources also offered at the EBI .
•Sequence Data:
   -The nucleotide sequences are always listed in the
direction 5' to 3', regardless of the published order.
-Bases are numbered sequentially beginning with 1 at the
5' end of the sequence.

EMBL Database Divisions:


Division Code
Human HUM
Rodents ROD
Mouse MUS
Other Mammels MAM
OtherVertebrate VRT
Invertebrate INV
Fungi FUN
Plants PLN
Prokaryote PRO
Organelles ORG
Viral VRL
Bacteriophage PHG
Synthetic SYN
Unclassified / Unannotated UNC
Patent PAT
High Troughput cDNA HTC
High Troughput Genome HTG
Expressed Sequence Tags EST
Sequence Tagged Sites STS
Genome Survey Sequence GSS
-The first subset (colored in dark blue) contains all Eukaryotes
DNA sequences, the second (colored in black) contains all
Prokaryotes DNA sequencse, the third (coloured in green
contains viruses and the fourth (colored in red) contains multi
species sequences and the fifth (colored in light blue) contains
TAGS.
o Database Formats:
-The elementary formate underlying information held in
DDBJ/EMBL/Genbank is the flatfile.
-The correspondence between individual flatfile formats
facilitates the exchange of data between each of these
databases.
-Here the simplex form for representing a sequence is in
FASTA formate,the greater than character(>)designates
the beginning of a new sequence record…this line is
known as ‘definition line’.
i.e

>U54469
CGGGGGCGGGTCGGGGGGGGAAAAATCCCCCTCT
GGCCCCCCCCCCCCTTCTGGGTGGTCAACGT
-Flatfiles can be seperated into three major parts:

•The header-which contains the information(descriptors)


that apply to the entire record.
•The features-which are the annotations on the record.
•The nucleotide sequence itself.
-All major nucleotide database flatefiles end with // on the
last line of the record.
-The first line of all flatfiles is the LOCUS line in DDBJ &
Genbank, which is equivalent to the ID line in EMBL.
(3)DDBJ :
-DDBJ (DNA Data Bank of Japan) began DNA data bank activities in
earnest in 1986 at the National Institute of Genetics (NIG) .
-From the beginning, DDBJ has been functioning as one of the International
DNA Databases, including EBI (European Bioinformatics Institute;
responsible for the EMBL database) in Europe and NCBI (National Center
for Biotechnology Information; responsible for GenBank database) in the
USA.
-DDBJ is the sole DNA data bank in Japan, which is officially certified to
collect DNA sequences from researchers and to issue the internationally
recognized accession number to data submitters.
-They collect data mainly from Japanese researchers, but now accept data
and issue the accession number to researchers in any other countries.
-Since they exchange the collected data with EMBL/EBI and
GenBank/NCBI .
(4)UniProtKB/Swiss-Prot:

It is a manually annotated protein knowledgebase


Established in 1986 and maintained since 2003 by
the UniProt Consortium, with a collaboration between the
Swiss Institute of Bioinformatics (SIB) and the Department
of Bioinformatics and Structural Biology of the Geneva
University, the European Bioinformatics Institute (EBI)
and the Georgetown University Medical Center's Protein
Information Resource (PIR).
-With collaboration to other databases it gives access to all
publicly available protien sequences.

You might also like