Download as pdf or txt
Download as pdf or txt
You are on page 1of 105

APPLICATION OF

BIOINFORMATICS IN MOLECULAR
BIOLOGY AND CURRENT
RESEARCHES
Dr. Ruchi Yadav
Assistant Professor
Amity University Uttar Pradesh Lucknow Campus
Bioinformatics

• The science of storing, retrieving and analyzing large


amounts of biological information

• An interdisciplinary science, involving biologists,


computer scientists and mathematicians

• At the heart of modern biology

2
Bioinformatics (Oxford English Dictionary):

• The branch of science concerned


with information and information
flow in biological systems, esp. the
use of computational methods in
genetics and genomics.
The field of science in which biology, computer science and
information technology merge into a single discipline
Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Bioinformaticians
Study biological
questions by analyzing
molecular data

Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop tools, softwares, algorithms
to store and analyze the data.
“Large-scale” focus

• Data explosion and new types of data


• High-throughput biology
• Emphasis on systems

5
ONLINE DATABASES FOR
MOLECULAR BIOLOGY
Types of data
Literature and ontologies

Genomes
Protein sequence
DNA & RNA sequence
Protein structure
Gene expression
Chemical entities
Protein families,
motifs and domains

Protein interactions

Pathways

Systems

7
How does it all looks like
on a computer monitor?

8
Databases: molecules to systems
Literature and ontologies
PubChem, GO
Genomes
Ensembl
Protein families,
Ensembl Genomes
motifs and domains
pfam,scop,InterPro
Nucleotide sequence Microarray & gene
EMBL,GenBank expression data Protein structure
ArrayExpress,GEO PDB

Protein interactions
Intact, STRING
Pathways
KEGG
Proteomes
UniProt, SWISS PROT

Chemical entities
PUBCHEM

Systems
BioModels

9
The “OMICS” cascade
‘Omics’: Discipline of analyzing the interactions of
biological information objects in various omes .

• genome
• transcriptome
• proteome
• metabolome
GENOMICS

11
GENOMICS
• Genomics is the study of genomes, including
large chromosomal segments containing
many genes.
• The initial phase of genomics aims to map
and sequence an initial set of entire
genomes.
• Functional genomics aims to deduce
information about the function of DNA
sequences.
Human Genome Project Background
 The idea of sequencing the entire human genome was
First proposed in discussions at scientific meetings
organized by the US Department of Energy and others
from 1984 to 1986
 Recommended a broader programme, to include:
 The creation of genetic, physical and sequence maps
of the human genome;
 Parallel efforts in key model organisms such as
bacteria, yeast, worms, fies and mice;
 Development of technology in support of these
objectives;
 Research into the ethical, legal and social issues
raised by human genome research.
Timeline of large-scale genomic analyses.
The Human Genome Project

The human genome sequence is complete - almost -


approximately 3 billion base pairs.
Whole genome sequencing has
now become routine
https://www.ncbi.nlm.nih.gov/

20
GENOMES
https://www.ncbi.nlm.nih.gov/guide/genomes-maps/

21
https://www.coronavirus.gov/

22
https://www.nih.gov/coronavirus

23
https://www.ncbi.nlm.nih.gov/sars-cov-2/

24
25
EBI Overview
26
TRANSCRIPTOMICS

27
Gene expression
• A human organism has over 250 different cell
types (e.g., muscle, skin, bone, neuron), most
of which have identical genomes, yet they look
different and do different jobs
• It is believed that less than 20% of the genes
are‘expressed’ (i.e., making RNA) in a typical
cell type
• Apparently the differences in gene expression
is what makes the cells different

28
29
Some questions for the golden age of
genomics
• How gene expression differs in different cell
types?
• How gene expression differs in a normal and
diseased (e.g., cancerous) cell?
• How gene expression changes when a cell is
treated by a drug?
• How gene expression changes when the
organism develops and cells are differentiating?
• How gene expression is regulated – which genes
regulate and how?
GENE EXPRESSION EXPERIMENTS

• Next Generation Sequencing : RNA Seq


• MICROARRAY
• SAGE: Serial Analysis of gene expression
• RNAi: RNA interference, siRNA, miRNA

3
2
Microarrays: tools for gene expression
A microarray is a solid support (such as a
membrane or glass microscope slide) on which
DNA of known sequence is deposited in a grid-
like array.
MICROARRAY TYPES

34
Cancer and Microarray
Analysis of regulation

36
ArrayExpress & Atlas of Gene Expression

• ArrayExpress Archive is a public repository of functional


genomics experiments, including gene expression,
supporting scientific publications
• You can query it to retrieve experimental information and
download functional genomics data
• Atlas of Gene Expression contains a subset of curated
and re-annotated Archive data
• Can be queried for individual gene expression under
different biological conditions across experiments

3
7
ARRAY EXPRESS (EBI)

38
39
40
41
Microarray Data Analysis

Microarray
Data
Analysis

Differential Synergistic Co-expression Molecular


Gene effect of Analysis Docking
Expression RV+NGF

Gene Synergistic Protein- RV-NGF


ontology and protein MCP-NGF
Expression interaction DEGs-RV
Pathway

42
https://www.ncbi.nlm.nih.gov/geo/

4
3
http://www.mirbase.org/
SAGE

• SAGE is a powerful tool that allows the analysis of


overall gene expression patterns with digital analysis.
• Produces a snapshot of the mRNA population in the
sample of interest.
• Because SAGE does not require a preexisting clone,
it can be used to identify and quantitate new genes
as well as known genes.
Schematic of
SAGE method:
48
https://www.ncbi.nlm.nih.gov/sra

49
50
PROTEOMICS
Proteomics definition:-
• The identification, characterization and
quantification
• of all proteins involved in a particular
pathway, organelle, cell, tissue, organ or
organism
• that can be studied in concert to provide
accurate and comprehensive data about
that system.”
Proteome Complexity
Key concern of proteomics

• How to reduce the complexity of samples

Separate the proteins in an electrical field:


electrophoresis
Proteomics branches
expression proteomics: the question
Proteomics methods
Typical Proteome Experiment
Typical Proteome Experiment
https://www.expasy.org/
https://www.uniprot.org/
SWISS-2D PAGE
• Two-dimensional polyacrylamide gel electrophoresis database
Protein Identification using Mass Spectrometry

protein from gel/ tryptic digestion &


PVDF/LC fraction peptide extraction
1-DE, 2-DE, LC TYGGAAR EHICLLGK
Mass spectrometry, GANK PSTTGVEMFR
PMF identification peptide mass fingerprints unmodified and
modified peptides

MS/MS identification MS Fragmentation

Mass spectrometry,
peptide MS fragments

Institut Suisse de Bioinformatique


Schweizerisches Institut für Bioinformatik
Swiss Institute of Bioinformatics
Tandem mass spectrometry (MS/MS) for mapping
posttranslational modifications
List of ptm and their properties
http://www.matrixscience.com/search_form_select.html
72
EBI Overview
https://www.rcsb.org/

73
74
75
76
77
EBI Overview
78
EBI Overview
Structures: PDB Sequence Linking to
mapping domain data

Ligands

Assemblies

Electron
density
visualization

Active sites

Surface
matching
Fold matching

79
Molecular interaction database: Intact

80
https://string-db.org/

81
82
https://pubchem.ncbi.nlm.nih.gov/

83
84
https://zinc.docking.org/

85
Pathways: KEGG
• A free, online, open-source curated database of pathways and
reactions in human biology

• Information in the database is authored by expert biologist


researchers, maintained by Reactome editorial staff

• Used to infer orthologous events in 22 non-human species including


mouse, rat, chicken, puffer fish, worm, fly, yeast

• Extensively cross-referenced to other resources e.g. NCBI, Ensembl,


UCSC genome Browser, UniProt, PubMed, KEGG, ChEBI and GO.

8
6
https://www.genome.jp/kegg/pathway.html
http://geneontology.org/
• Enrichment analysis
Gene Ontology (GO) Biological process and
pathway enrichment analysis of DEG

2.4
%

90
Gene Ontology (Biological Process)
• GO of 50 genes that are positive significant

biological regulation (GO:0065007)


cell proliferation (GO:0008283)
2.2 %
cellular component organization or biogenesis (GO:0071840)
cellular process (GO:0009987)
developmental process (GO:0032502)
4.3 % localization (GO:0051179)
metabolic process (GO:0008152)
multicellular organismal process (GO:0032501)
2.1 %
reproduction (GO:0000003)
response to stimulus (GO:0050896)
signaling (GO:0023052)

91
BASIC BIOINFORMATICS
TOOLS FOR RESEARCH
https://blast.ncbi.nlm.nih.gov/Blast.cgi

9
3
BLASTP
RESULT ANALYSIS
https://www.ebi.ac.uk/Tools/msa/clustalo/
HOMOLOGY MODELING

• https://swissmodel.expasy.org/
SWISS-MODEL
http://www.swissdock.ch/
https://www.schrodinger.com/
• ONE STEP SOLUTION FOR MOLECUALR MODLEING AN DDRUG
DESIGN
-7.918
1.

-7.194
2.

101
https://www.bioconductor.org/

1
0
2
REVIEW

• GENOMICS
• PROTEOMICS
• TRANSCRIPTOMICS
• METABOLOMICS
• MOLECUALR INTERACTION
• PATHWAY STUDY
• GENE ONTOLOGY
• MOLECUALR MODELING
• DRUG DESIGN

1
0
3
104
Stay Safe and
Keep
Browsing

ryadav@lko.amity.edu

105

You might also like