Professional Documents
Culture Documents
WINSEM2022-23 BBIT205L TH VL2022230501228 Reference Material I 19-12-2022 Introduction To Bioinformatics
WINSEM2022-23 BBIT205L TH VL2022230501228 Reference Material I 19-12-2022 Introduction To Bioinformatics
What Is Bioinformatics?
Bioinformatics is the application of computer technology to the
which will certainly reveal the mysteries of life and will bring
cure for many diseases.
Bioinformatics is been an inevitable part of Biotechnology in
Genome Project.
8
Need for Bioinformatics
The number of entries in databases of gene sequences is
increasing exponentially.
Bioinformaticians are needed to understand and use this
information.
Genome sequencing projects, including human genome
project are producing vast amounts of information. The
challenge is to use this information in a useful way.
The huge amount of sequence information available
cannot be analyzed manually. Computers have become
essential tools in the study of information. This was
predicted by Watter Gilbers who won a Nobel prize for
his contribution to the discovery of DNA sequencing.
He emphasized the focus of molecular biology must shift
towards computer-assisted biological sequence analysis.
How we are going to handle the large amount of
Biological data or information?
It is possible with Bioinformatics and Internet.
Computers & Bioinformatics
11
The number of software products is growing constantly, so
that it is impossible to list, as software developers working
in the life sciences (or life scientists with software
development talents), are constantly updating and producing
useful new applications.
Bioinformatics is associated typically with massive
databases of gene and protein sequence and
structure/function information databases.
New sequences, new structures or protein/gene function that
are discovered are searched, (compared) against what is
already known, (gathered), and deposited into the databases.
These searches are done by remote computer access using
various bioinformatics tools.
Definitions
The application of computational techniques to solve
biological problems.
Application of information technology to the storage,
management and analysis of biological information
Bioinformatics is the unified discipline formed from
the combination of biology, computer science, and
information technology.
“The mathematical, statistical and computing methods
that aim to solve biological problems using DNA and
amino acid sequences and related information”.
The branch of science concerned with information
and information flow in biological systems, esp. the
use of computational methods in genetics and
genomics.
The marriage between computer science and
molecular biology
The algorithm and techniques of computer science
are being used to solve the problems faced by
molecular biologists
‘Information technology applied to the management
and analysis of biological data’
Storage and Analysis are two of the important
functions – bioinformaticians build tools for each
Bioinformatics is the use of computers to solve biological
and biomedical problems.
Computer
Science Statistics
Bioinformatics
Bioinformatics: Integration of several fields
Physics
Computer
Science Biological
Science
Bioinformatics
Mathematics Chemistry
Statistics
The field of science in which biology, computer science and
information technology merge into a single discipline
Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Bioinformaticians
Study biological questions
by analyzing molecular
data
Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop tools, softwares, algorithms
to store and analyze the data.
Bioinformatics Hub
19
Use of Bioinformatics
Analysis and interpretation of various types of biological
data including: nucleotide and amino acid sequences,
protein domains, and protein structures.
Development of new algorithms and statistics with which
to assess biological information, such as relationships
among members of large data sets.
Development and implementation of tools that enable
efficient access and management of different types of
information, such as various databases, integrated
mapping information.
Use of Bioinformatics
DNA analysis
Genome sequencing
Sequence assembly
Sequence/gene annotations
Gene finding/Sequence translation tools
Sequence Similarity searching (e.g.. BLAST, ClustalW)
Comparison between genomes
Drug designing
Sequence
Sequence similarity
Conserved motifs
Protein Evolution
Use of Bioinformatics (..contd.)
Other uses:
Drug designing
Vaccine development
Dairy technology
Forensics
Crop improvement
Designing enzymes for detergents
Genetic counseling
Computational Goals of Bioinformatics
Learn & Generalize: Discover conserved patterns (models) of sequences,
structures, interactions, metabolism & chemistries from well-studied
examples.
Prediction: Infer function or structure of newly sequenced genes, genomes,
proteins or proteomes from these generalizations.
Organize & Integrate: Develop a systematic and genomic approach to
molecular interactions, metabolism, cell signaling, gene expression…
Simulate: Model gene expression, gene regulation, protein folding, protein-
protein interaction, protein-ligand binding, catalytic function, metabolism…
Engineer: Construct novel organisms or novel functions or novel regulation
of genes and proteins.
Gene Therapy: Target specific genes, or mutations, RNAi to change a
disease phenotype.
Recent events making bioinformatics
more important
Exponential expansion of biological information
Expansion of multiple types of information
Cheaper high throughput technologies
Improvement in computation power
Lack of standards/quality
Need for micro and macro analysis
Need for better algorithms
As the definition suggests there are two parts involved in
it namely the biology part and the computer part.
The Biology part can be discussed under three
subheadings:
Genomics
Proteomics and
Drug Designing
What is Genomics?
Genome
complete set of genetic instructions for making an
organism
Genomics is the study of whole genomes of
organisms and incorporates elements from genetics.
any attempt to analyze or compare the entire
genetic complement of a species
Early genomics was mostly recording genome
sequences
Genomics is an interdisciplinary field of biology
focusing on the structure, function, evolution,
mapping and editing of genomes.
Genomics uses a combination of recombinant DNA, DNA
sequencing methods, and bioinformatics to sequence,
assemble and analyse the structure and function of
genomes.
Genomics harnesses the availability of complete DNA
sequences for entire organisms and was made possible by
both the pioneering work of Fred Sanger and the more
recent next-generation sequencing technology.
Fred Sanger's group established techniques of
sequencing, genome mapping, data storage, and
bioinformatic analyses in the 1970s and 1980s. This
work paved the way for the human genome project in
the 1990s, an enormous feat of global collaboration
that culminated in the publication of the complete
human genome sequence in 2003.
Today, next-generation sequence technologies have
led to spectacular improvements in the speed, capacity
and affordability of genome sequencing.
Moreover, advances in bioinformatics have enabled
hundreds of life-science databases and projects that
provide support for scientific research.
Information stored and organised in these databases
can easily be searched, compared and analysed.
History of Genomics
1980
First complete genome sequence for an organism is
published
FX174 - 5,386 base pairs coding nine proteins.
~5Kb
1995
Haemophilus influenzae genome sequenced (flu bacteria,
1.8 Mb)
1996
Saccharomyces cerevisiae (baker's yeast, 12.1 Mb)
1997
E. coli (4.7 Mbp)
2000
Pseudomonas aeruginosa (6.3 Mbp)
A. thaliana genome (100 Mb)
D. melanogaster genome (180Mb)
Genes carry the information for making all of the proteins required by the body for
growth and maintenance.
The genome also encodes rRNA and tRNA which are involved in protein synthesis.
Made up of ~35,000-50,000 genes which code for functional proteins in the body
Includes non-coding sequences located between genes, which makes up the vast
majority of the DNA in the genome (~95%)
The particular order of nucleotide bases (As, Gs, Cs, and Ts) determines the amino
acid composition of proteins
G)
The average gene consists of 3000 bases, but sizes vary greatly, with the largest
Almost all (99.9%) nucleotide bases are exactly the same in all people. Almost
half of all human proteins share similarities with other organisms, underscoring
the unity of live
The functions are unknown for over 50% of discovered genes. About 75% of the
human genome is “junk”
The genome is our Genetic Blueprint
Genes
Functional Genomics
Other, more direct, large-scale ways of identifying gene
functions and associations
(for example yeast two-hybrid methods)
Structural Genomics
emphasizes high-throughput, whole-genome
analysis.
outlines the current state
future plans of structural genomics efforts around the world
and describes the possible benefits of this research
What Is Proteomics?
Proteomics is the study of the proteome—the “PROTEin
complement of the genOME”
More specifically, “the qualitative and quantitative
comparison of proteomes under different conditions to
further unravel biological processes”
The terms proteome and proteomics were coined by Mark
Wilkins and colleagues in the early 1990’s
Definition: That’s just not a protein biochemistry !
Proteome : is the complement protein found in a
single cell in a particular environment. / is complete
collection of proteins encoded by genome of an
organism.
Proteomics : is the study of composition, structure,
function and interaction of the proteins directing the
activities of each living cell
The level of any protein in the cell at any given time
is controlled by 1. Rate of transcription of the gene 2.
The efficiency of translation of m RNA into protein 3.
The rate of degradation of protein in the cell
Protein structure:
Primary structure- is sequence of specific amino acid
Secondary structure- the primary polypeptide chain gets
properly folded In the form of alpha-helix ,bet a pleated
sheet, random coils and turns
Tertiary structure: secondary structure interact with each
other chemically to form the 3 dimensional shape of the
proteins
Quaternary structure: interaction between different
polypeptide unit
Relationship between structure and function :
Hydrophobicity is determined by primary and secondary
structure
Eg. - Membrane spanning regions of membrane proteins
are typically alpha helices made of hydrophobic AA which
interact with hydrophobic lipids forming stable membrane
structure
Proteomics
is used to investigate:
when and where proteins are expressed;
rates of protein production, degradation, and steady-state abundance;
how proteins are modified (for example, post-translational modifications
(PTMs) such as phosphorylation);
the movement of proteins between subcellular compartments;
the involvement of proteins in metabolic pathways;
how proteins interact with one another.
can provide significant biological information for many biological problems,
such as:
Which proteins interact with a particular protein of interest (for example,
the tumor suppressor protein p53)?
Which proteins are localized to a subcellular compartment (for example,
the mitochondrion)?
Which proteins are involved in a biological process (for example,
circadian rhythm)?
Drug designing
It is an important aspect in Bioinformatics in
which we identify the target for a drug and
synthesize suitable inhibitors.
Computer aided drug design
Cut the cost and time of drug discovery with great effect
It is possible to select candidate drug molecules from huge
available databases to check whether it can bind to active site
using docking procedures
Docking procedures such as Hex, Argus Lab and Auto dock
capable of docking the small molecules to selected active sites
of target molecules and give relative score of molecules thus
predicted computationally is then passed on to the wet lab for
synthesis and clinical trials
e.g human body produces required proteins P1,P23,P3
Pathogenic bacteria , virus produces the own protein (X)
X can bind with P1 (causing disease)
To stop the disease introduce y molecule , Y is attached
towards X, P1 is released
The processes of designing a new drug using bioinformatics tools
have open a new area of research.
However, computational techniques assist one in searching drug target
and in designing drug in silico, but it takes long time and money. In
order to design a new drug one need to follow the following path.
1. Identify target disease
2. Study Interesting Compounds
3. Detection the Molecular Bases for Disease
4. Rational Drug Design Techniques
5. Refinement of Compounds
6. Quantitative Structure Activity Relationships (QSAR)
7. Solubility of Molecule
8. Drug Testing
Bioinformatics Analysis?
It is like any other lab analysis!
You need to know your data/input sources
You need to understand your methods and their assumptions
You need a plan to get from point A to point B
You need to understand your equipment
You need to be critical and understand potential sources of
error
You need to interpret your results
Your results need to be reproducible
Your results should be testable
SKILLS REQUIRED FOR A BIOINFORMATICIAN
◦ Operating system :
Controls and coordinates use of hardware among various
applications and users
◦ Application programs : define the ways in which the system
resources are used to solve the computing problems of the users
Word processors, compilers, web browsers, database systems,
video games
◦ Users
People, machines, other computers
In simple terms, an OS is a manager.
It manages all the available resources on a
computer. These resources can be the hard
disk, a printer, or the monitor screen, even
memory is a resource that needs to be
managed.
Unix, linux, DOS, Irex and windows.
Relational Data Base Management
Systems(RDBMS)
a DBMS in which data is stored in the form of tables and
the relationship among the data is also stored in the form of
tables.
a DBMS is a set of computer programs that controls the
creation, maintenance and the use of a database.
a DBMS is a system software package that helps the use of
integrated collection of data records and files known as
databases.
a DBMS is a software program that enables the creation
and management of databases.
a DBMS allows users and other soft wares to store and
retrieve data in a computer.
Oracle
Sybase
DB2 (IBM)
MS SQL Server
MS Access
Ingres
PostgreSQL
MySQL
Programming Languages
A set of rules and symbols used to construct
a computer program
A language used to interact with the
computer.
PL is an artificial language designed to
express computations that can be performed
by a machine, particularly a computer.
C, C++, Java, VB, Fortran, python and Perl.