Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 59

Bioinformatics

What Is Bioinformatics?
Bioinformatics is the application of computer technology to the

field of biology and medicine.


 Bioinformatics has become one of the fastest growing fields,

which will certainly reveal the mysteries of life and will bring
cure for many diseases.
 Bioinformatics is been an inevitable part of Biotechnology in

the current era.


Bioinformatics is the use of computers, software tools, and

databases to handle biological information.


Bioinformatics is the comprehensive application of mathematics, statistics,

biochemistry, biophysics and computer algorithms to analyze biological data.


Bioinformatics is the creation of data algorithms and specialized computer

software to identify and classify components of a biological system, such as


DNA and protein sequences.
Bioinformatics is a rapidly developing branch of biology, highly
interdisciplinary, using techniques and concepts from
 informatics,
 statistics,
 mathematics,
 chemistry,
 biochemistry,
 physics,
 linguistics.
What Is Bioinformatics?
Bioinformatics is a science straddling the domains of
biomedical, informatics, mathematics and statistics.
Bioinformatics is applying computational techniques
to biology data.
Bioinformatics is a combination of molecular biology
and computer sciences.
Bioinformatics is the use of computational approach
to analyse, manage and store biological data.
Bioinformatics has become very important in the field
of Biotechnology.
All the information process by Biotechnology is
stored and analyzed using Bioinformatics.
Bioinformatics impacts on all aspects of biological
research.
 Bioinformatics is used in drug designing and drug
development.
What Is Bioinformatics?

An emerging interdisciplinary research area.

The term Bioinformatics was first published in 1991.

Bioinformatics only made possible for completing the Human

Genome Project.

It is the shortened form of “Biological Information”


What Is Bioinformatics?

Bioinformatics is a multifaceted discipline combining many scientific fields


including computational biology, statistics, mathematics, molecular biology and
genetics (Fenstermacher, 2005, p. 440).
Path to the Bioinformatics
1st,
 Learn Biology.  
2nd,
 Decide and pick a problem that interests you for experiment. 
3rd,
 Find and learn about the Bioinformatics tools.  
4th,
 Learn the Computer Programming Languages.
 Perl, Python, R, Java, etc.  
5th,
 Experiment on your computer and learn different programming
techniques.

8
Need for Bioinformatics
The number of entries in databases of gene sequences is
increasing exponentially.
Bioinformaticians are needed to understand and use this
information.
Genome sequencing projects, including human genome
project are producing vast amounts of information. The
challenge is to use this information in a useful way.
The huge amount of sequence information available
cannot be analyzed manually. Computers have become
essential tools in the study of information. This was
predicted by Watter Gilbers who won a Nobel prize for
his contribution to the discovery of DNA sequencing.
He emphasized the focus of molecular biology must shift
towards computer-assisted biological sequence analysis.
How we are going to handle the large amount of
Biological data or information?
It is possible with Bioinformatics and Internet.
Computers & Bioinformatics

Bioinformatics is the computer-assisted data


management discipline that helps us: Gather, store,
analyze, integrate biological and genetic information
(data), and represent this information efficiently.
Bioinformatics experts claim that ‘Bioinformatics is the
electronic infrastructure of molecular biology’
There are many different bioinformatics tools available
over the Internet free of charge to whomever wishes to
use them.
There are also many commercial software packages
used in bioinformatics by researchers who can afford it.

11
The number of software products is growing constantly, so
that it is impossible to list, as software developers working
in the life sciences (or life scientists with software
development talents), are constantly updating and producing
useful new applications.
Bioinformatics is associated typically with massive
databases of gene and protein sequence and
structure/function information databases.
New sequences, new structures or protein/gene function that
are discovered are searched, (compared) against what is
already known, (gathered), and deposited into the databases.
These searches are done by remote computer access using
various bioinformatics tools.
Definitions
The application of computational techniques to solve
biological problems.
Application of information technology to the storage,
management and analysis of biological information
Bioinformatics is the unified discipline formed from
the combination of biology, computer science, and
information technology.
“The mathematical, statistical and computing methods
that aim to solve biological problems using DNA and
amino acid sequences and related information”.
The branch of science concerned with information
and information flow in biological systems, esp. the
use of computational methods in genetics and
genomics.
The marriage between computer science and
molecular biology
The algorithm and techniques of computer science
are being used to solve the problems faced by
molecular biologists
‘Information technology applied to the management
and analysis of biological data’
Storage and Analysis are two of the important
functions – bioinformaticians build tools for each
 Bioinformatics is the use of computers to solve biological
and biomedical problems.

 Bioinformatics is the application of information technology


to mine, visualize, analyze, integrate, and manage
biological and genetic information, which can then be
applied in, among other things, accelerating drug discovery
and development.

 Application of tools of computation and analysis to the


capture and interpretation of biological data.

 Biological Data management and analysis.


Biology Chemistry

Computer
Science Statistics

Bioinformatics
Bioinformatics: Integration of several fields

Physics

Computer
Science Biological
Science

Bioinformatics

Mathematics Chemistry
Statistics
The field of science in which biology, computer science and
information technology merge into a single discipline
Biologists
collect molecular data:
DNA & Protein sequences,
gene expression, etc.
Bioinformaticians
Study biological questions
by analyzing molecular
data

Computer scientists
(+Mathematicians, Statisticians, etc.)
Develop tools, softwares, algorithms
to store and analyze the data.
Bioinformatics Hub

19
Use of Bioinformatics
Analysis and interpretation of various types of biological
data including: nucleotide and amino acid sequences,
protein domains, and protein structures.
Development of new algorithms and statistics with which
to assess biological information, such as relationships
among members of large data sets.
Development and implementation of tools that enable
efficient access and management of different types of
information, such as various databases, integrated
mapping information.
Use of Bioinformatics
DNA analysis
Genome sequencing
 Sequence assembly
 Sequence/gene annotations
 Gene finding/Sequence translation tools
 Sequence Similarity searching (e.g.. BLAST, ClustalW)
 Comparison between genomes

 Evolution of sequences (Phylogenetic analysis)


 Gene expression
Use of Bioinformatics (..contd.)
Protein analysis
Structure
X-raycrystallography
Homology based models

Drug designing

Sequence
Sequence similarity

Protein family assignments

Conserved motifs

Proteomics data analysis

Protein Evolution
Use of Bioinformatics (..contd.)
Other uses:
Drug designing
Vaccine development
Dairy technology
Forensics
Crop improvement
Designing enzymes for detergents
Genetic counseling
Computational Goals of Bioinformatics
 Learn & Generalize: Discover conserved patterns (models) of sequences,
structures, interactions, metabolism & chemistries from well-studied
examples.
 Prediction: Infer function or structure of newly sequenced genes, genomes,
proteins or proteomes from these generalizations.
 Organize & Integrate: Develop a systematic and genomic approach to
molecular interactions, metabolism, cell signaling, gene expression…
 Simulate: Model gene expression, gene regulation, protein folding, protein-
protein interaction, protein-ligand binding, catalytic function, metabolism…
 Engineer: Construct novel organisms or novel functions or novel regulation
of genes and proteins.
 Gene Therapy: Target specific genes, or mutations, RNAi to change a
disease phenotype.
Recent events making bioinformatics
more important
Exponential expansion of biological information
Expansion of multiple types of information
Cheaper high throughput technologies
Improvement in computation power
Lack of standards/quality
Need for micro and macro analysis
Need for better algorithms
As the definition suggests there are two parts involved in
it namely the biology part and the computer part.
The Biology part can be discussed under three
subheadings:
Genomics
Proteomics and
Drug Designing
What is Genomics?
Genome
complete set of genetic instructions for making an
organism
Genomics is the study of whole genomes of
organisms and incorporates elements from genetics.
any attempt to analyze or compare the entire
genetic complement of a species
Early genomics was mostly recording genome
sequences
Genomics is an interdisciplinary field of biology
focusing on the structure, function, evolution,
mapping and editing of genomes.
Genomics uses a combination of recombinant DNA, DNA
sequencing methods, and bioinformatics to sequence,
assemble and analyse the structure and function of
genomes.
Genomics harnesses the availability of complete DNA
sequences for entire organisms and was made possible by
both the pioneering work of Fred Sanger and the more
recent next-generation sequencing technology.
Fred Sanger's group established techniques of
sequencing, genome mapping, data storage, and
bioinformatic analyses in the 1970s and 1980s. This
work paved the way for the human genome project in
the 1990s, an enormous feat of global collaboration
that culminated in the publication of the complete
human genome sequence in 2003.
Today, next-generation sequence technologies have
led to spectacular improvements in the speed, capacity
and affordability of genome sequencing.
Moreover, advances in bioinformatics have enabled
hundreds of life-science databases and projects that
provide support for scientific research.
Information stored and organised in these databases
can easily be searched, compared and analysed.
History of Genomics
1980
First complete genome sequence for an organism is
published
 FX174 - 5,386 base pairs coding nine proteins.
 ~5Kb
1995
Haemophilus influenzae genome sequenced (flu bacteria,
1.8 Mb)
1996
Saccharomyces cerevisiae (baker's yeast, 12.1 Mb)
1997
E. coli (4.7 Mbp)
2000
Pseudomonas aeruginosa (6.3 Mbp)
A. thaliana genome (100 Mb)
D. melanogaster genome (180Mb)

2001 The Big One

The Human Genome sequence is published


3 Gb
What is the Human Genome?
 The entire genetic makeup of the human cell nucleus.

 Genes carry the information for making all of the proteins required by the body for
growth and maintenance.

 The genome also encodes rRNA and tRNA which are involved in protein synthesis.
Made up of ~35,000-50,000 genes which code for functional proteins in the body

Includes non-coding sequences located between genes, which makes up the vast
majority of the DNA in the genome (~95%)

The particular order of nucleotide bases (As, Gs, Cs, and Ts) determines the amino
acid composition of proteins

Information about DNA variations (polymorphisms) among individuals can lend


insight into new technologies for diagnosing, treating, and preventing diseases that
afflict humankind.
What Goals Were Established for the Human Genome
Project When it Began in 1990?
 Identify all of the genes in human DNA
 Determine the sequence of the 3 billion chemical nucleotide
bases that make up human DNA
 Store this information in data bases
 Develop faster, more efficient sequencing technologies
 Develop tools for data analysis
 Address the ethical, legal, and social issues (ELSI) that
arises form the project
Milestone
1990: Project initiated as joint effort of U.S. Department of Energy

and the National Institutes of Health


June 2000: Completion of a working draft of the entire human

genome (covers >90% of the genome to a depth of 3-4x redundant


sequence)
February 2001: Analyses of the working draft are published

April 2003: HGP sequencing is completed and Project is declared

finished two years ahead of schedule


The human genome contains 3 billion chemical nucleotide bases (A, C, T, and

G) 

The average gene consists of 3000 bases, but sizes vary greatly, with the largest

known human gene being dystrophin at 2.4 million bases

 The total number of genes is estimated at around 30,000--much lower than

previous estimates of 80,000 to 140,000. Only about 2% of the human genome


contains genes, which are the instructions for making proteins

 Almost all (99.9%) nucleotide bases are exactly the same in all people. Almost

half of all human proteins share similarities with other organisms, underscoring
the unity of live
 The functions are unknown for over 50% of discovered genes. About 75% of the
human genome is “junk”
The genome is our Genetic Blueprint

 Nearly every human cell contains


23 pairs of chromosomes
 1 - 22 and XY or XX
 XY = Male
 XX = Female

 Length of chr 1-22, X, Y together


is ~3.2 billion bases (about 2
meters diploid)
The Genome is Who We Are on the inside!

 Chromosomes consist of DNA

 molecular strings of A, C, G, &


T
 base pairs, A-T, C-G

 Genes

 DNA sequences that encode


proteins
 less than 3% of human genome
What next?
Post Genomic era
Comparative Genomics
Functional Genomics
Structural Genomics

Researchers hope to find quick ways to identify the DNA


regions associated with common complex diseases
They also hope to understand how genetic variation contributes
to responses in environmental factors
The development of new methods to accelerate genome work
Comparative Genomics
the management and analysis of the millions of data points
that result from Genomics
Sorting out the mess

Functional Genomics
Other, more direct, large-scale ways of identifying gene
functions and associations
(for example yeast two-hybrid methods)
Structural Genomics
emphasizes high-throughput, whole-genome
analysis.
outlines the current state
future plans of structural genomics efforts around the world
and describes the possible benefits of this research
What Is Proteomics?
Proteomics is the study of the proteome—the “PROTEin
complement of the genOME”
More specifically, “the qualitative and quantitative
comparison of proteomes under different conditions to
further unravel biological processes”
The terms proteome and proteomics were coined by Mark
Wilkins and colleagues in the early 1990’s
Definition: That’s just not a protein biochemistry !
Proteome : is the complement protein found in a
single cell in a particular environment. / is complete
collection of proteins encoded by genome of an
organism.
Proteomics : is the study of composition, structure,
function and interaction of the proteins directing the
activities of each living cell
The level of any protein in the cell at any given time
is controlled by 1. Rate of transcription of the gene 2.
The efficiency of translation of m RNA into protein 3.
The rate of degradation of protein in the cell
Protein structure:
 Primary structure- is sequence of specific amino acid
 Secondary structure- the primary polypeptide chain gets
properly folded In the form of alpha-helix ,bet a pleated
sheet, random coils and turns
Tertiary structure: secondary structure interact with each
other chemically to form the 3 dimensional shape of the
proteins
 Quaternary structure: interaction between different
polypeptide unit
Relationship between structure and function :
 Hydrophobicity is determined by primary and secondary
structure
Eg. - Membrane spanning regions of membrane proteins
are typically alpha helices made of hydrophobic AA which
interact with hydrophobic lipids forming stable membrane
structure
Proteomics
is used to investigate:
when and where proteins are expressed;
rates of protein production, degradation, and steady-state abundance;
how proteins are modified (for example, post-translational modifications
(PTMs) such as phosphorylation);
the movement of proteins between subcellular compartments;
the involvement of proteins in metabolic pathways;
how proteins interact with one another.
can provide significant biological information for many biological problems,
such as:
Which proteins interact with a particular protein of interest (for example,
the tumor suppressor protein p53)?
Which proteins are localized to a subcellular compartment (for example,
the mitochondrion)?
Which proteins are involved in a biological process (for example,
circadian rhythm)?
Drug designing
It is an important aspect in Bioinformatics in
which we identify the target for a drug and
synthesize suitable inhibitors.
Computer aided drug design
Cut the cost and time of drug discovery with great effect
It is possible to select candidate drug molecules from huge
available databases to check whether it can bind to active site
using docking procedures
Docking procedures such as Hex, Argus Lab and Auto dock
capable of docking the small molecules to selected active sites
of target molecules and give relative score of molecules thus
predicted computationally is then passed on to the wet lab for
synthesis and clinical trials
e.g human body produces required proteins P1,P23,P3
Pathogenic bacteria , virus produces the own protein (X)
X can bind with P1 (causing disease)
 To stop the disease introduce y molecule , Y is attached
towards X, P1 is released
The processes of designing a new drug using bioinformatics tools
have open a new area of research.
However, computational techniques assist one in searching drug target
and in designing drug in silico, but it takes long time and money. In
order to design a new drug one need to follow the following path.
1. Identify target disease
2. Study Interesting Compounds
3. Detection the Molecular Bases for Disease
4. Rational Drug Design Techniques
5. Refinement of Compounds
6. Quantitative Structure Activity Relationships (QSAR)
7. Solubility of Molecule
8. Drug Testing
Bioinformatics Analysis?
It is like any other lab analysis!
 You need to know your data/input sources
 You need to understand your methods and their assumptions
 You need a plan to get from point A to point B
 You need to understand your equipment
 You need to be critical and understand potential sources of
error
 You need to interpret your results
 Your results need to be reproducible
 Your results should be testable
SKILLS REQUIRED FOR A BIOINFORMATICIAN

Molecular Biology - Should have a basic


knowledge of molecular biology.
Experience with one or more of Molecular
Biology software packages - Learn to use
sequence analysis and molecular modeling
software. Some of the molecular biology
packages are BLAST, FASTA etc.
http://molbiol-tools.ca/
Computer's Operating system's - Windows and Linux

Database Management Systems - Oracle and MySQL (Free

Database Server) is widely used to store large biological data.

Computer Programming Language - C/C++, Perl, Python,

Java and HTML should be known by Bioinformatician.


What is an Operating System?
An OS is a program that acts as an intermediary
between the user of a computer and computer
hardware.
It provides a user friendly environment in which a
user may easily develop and execute programs.
Otherwise, hardware knowledge would be mandatory
for CP.
So, OS hides the complexity of hardware from
uninterested users.
Goals of an Operating System
 Simplify the execution of user programs and make
solving user problems easier.
 Use computer hardware efficiently.
◦ Allow sharing of hardware and software resources.
 Make application software portable and versatile.
 Provide isolation, security and protection among user
programs.
 Improve overall system reliability
 error confinement, fault tolerance, reconfiguration.
Computer System Structure
 Computer system can be divided into four components
◦ Hardware – provides basic computing resources
 CPU, memory, I/O devices

◦ Operating system :
 Controls and coordinates use of hardware among various
applications and users
◦ Application programs : define the ways in which the system
resources are used to solve the computing problems of the users
 Word processors, compilers, web browsers, database systems,
video games
◦ Users
 People, machines, other computers
In simple terms, an OS is a manager.
It manages all the available resources on a
computer. These resources can be the hard
disk, a printer, or the monitor screen, even
memory is a resource that needs to be
managed.
Unix, linux, DOS, Irex and windows.
Relational Data Base Management
Systems(RDBMS)
 a DBMS in which data is stored in the form of tables and
the relationship among the data is also stored in the form of
tables.
a DBMS is a set of computer programs that controls the
creation, maintenance and the use of a database.
 a DBMS is a system software package that helps the use of
integrated collection of data records and files known as
databases.
 a DBMS is a software program that enables the creation
and management of databases.
 a DBMS allows users and other soft wares to store and
retrieve data in a computer.
Oracle
Sybase
 DB2 (IBM)
 MS SQL Server
 MS Access
 Ingres
 PostgreSQL
 MySQL
Programming Languages
A set of rules and symbols used to construct
a computer program
A language used to interact with the
computer.
PL is an artificial language designed to
express computations that can be performed
by a machine, particularly a computer.
C, C++, Java, VB, Fortran, python and Perl.

You might also like