Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

AN

INTRODUCTION
From Biology to Bioinformatics
• Tremendous recent progress in
– Biology (molecular, genetics, etc..)
– Technological advancements
• Opened many new domains
• From Academic Interest
– To Commercial interest
• From Knowledge discovery
– To industrial development
• Added pursuit for
– Longer life
– Cure for diseases
Human Genome Project (HGP)
• Started 1986 (1990 formally) completed April 2003
• U.S. Department of Energy (DoE) and the National
Institutes of Health (NIH)

Goals:
■identify all the genes in human DNA,
■determine all the sequences of chemical base pairs
that make up human DNA
■ store this information in databases,
■ improve tools for data analysis,
■transfer related technologies to the private sector,
and
■address the ethical, legal, and social issues (ELSI)
that may arise from the project.

http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
Human Genome Project (HGP)

By the Numbers
• 3 billion (3164.7 million) chemical nucleotide bases (A, C, T, and G).
•The average gene consists of 3000 bases, but sizes vary greatly, (largest
one 2.4 million bases).
•The total number of genes is estimated at around 30,000--much lower
than previous estimates of 80,000 to 140,000.
• The functions are unknown for over 50% of discovered genes.

http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
How does the human genome stack up?
Organism Genome Size Estimated
(Bases) Genes
Human (Homo sapiens) 3 billion 30,000

Laboratory mouse (M. musculus) 2.6 billion 30,000

Mustard weed (A. thaliana) 100 million 25,000

Roundworm (C. elegans) 97 million 19,000

Fruit fly (D. melanogaster) 137 million 13,000

Yeast (S. cerevisiae) 12.1 million 6,000

Bacterium (E. coli) 4.6 million 3,200


Human immunodeficiency virus (HIV) 9700 9

What about Chimpanzee Genome?

Other Genome Projects:


www.tigr.org/tdb
Chimpanzee VS Human

• Completed in August 2005


• 2.8 Billion Base pairs
• 29% genes are absolutely identical
• Average protein changing mutation 2
• Some genes has radical changes
• But Genome similarity Not Everything
• Domestic dog genomes 99.85% similar
• M. Musculus and M. Spretus have similar similarity
• Will help to unveil the evolution
Quantity of Data
• Sequencing, Proteomics, Gene Expression
Data, Metabolic studies etc.
• For different organisms: from very simple
virus/bacteria to very complex homo sapiens
• These data are stored in many public and
private databases
• Human genome: 30,000 genes and 1.5
million proteins (approx.)
• One gene Needs: 300 TB (approx)
trace data
• Only Medical imaging generates
400MGB
Quality of Data

• Biological data is highly


– Intricate
– Interrelated
– Imperfect (Noisy)
• You start with protein – need to take care of
structure, function, interaction, sequence,
relation etc…
• We need sophisticated repository and tools to
deal with
What Bioinformatics can do?
• Organizing variety of related information
• Develop tools and techniques for analyzing and
interpretation of these data
• Gene annotation, gene function prediction, gene
library establishment
• Gene to protein identification, function and structure
prediction
• Methodology for identifying and understanding the
molecular machineries
• Modeling, simulation and inference of metabolic,
genetic and protein networks
• Provide guideline to identify the origin of disease
• Help in drug discovery and cure disease
What is Bioinformatics? Definition
• Simple definition – bringing biological themes to computers

• Peter Elkin: Primer on Medical Genomics: Part V: Bioinformatics


– “Bioinformatics is the discipline that develops and applies informatics to the
field of molecular biology.”
• BISTIC Bioinformatics Definition
– “Research, development, or application of computational tools and
approaches for expanding the use of biological, medical, behavioral
or health data, including those to acquire, store, organize, archive,
analyze, or visualize such data”
• BISTIC Computational Biology Definition
– “Computational Biology: the development and application of data-
analytical and theoretical methods, mathematical modeling and
computational simulation techniques to the study of biological,
behavioral, and social systems.”
• http://www.bisti.nih.gov/
Bioinformatics

The applications of computer sciences to


molecular biology in particular to the study of
macromolecules such as proteins and nucleic
acids.

Synonyms: Molecular Bioinformatics,


Computational Biology, Biocomputing
How Technology Interacts with Bioinformatics
Useful/Necessary Bioinformatics Skills
• Strong background in some aspect of molecular biology!!!
• Ability to communicate biological questions
comprehensibly to computer scientists
• Thorough comprehension of the problem in the
bioinformatics field
• Statistics (association studies, clustering,
sampling)
• Ability to filter, and parse data and determine the
relationships between the data sets
• Mathematics (e.g. algorithm development)
• Engineering (e.g. robotics)
• Good knowledge of a few molecular biology software packages
(molecular modeling / sequence analysis)
• Command line computing environment (Linux/Unix knowledge)
• Data administration (esp. relational database concept) and
Computer Programming Skills/Experience (C/C++, Sybase, Java,
Oracle) and Scripting Language Knowledge (Perl and perhaps
Phython)
Explosion of Genome Sequence Data
High throughput DNA sequencing Centre
DNA sequences are meaningless!
gggtctctcttgttagaccagatctgagcctgggagctctctggctaactagggaacccactgcttaagcctcaataaagcttgccttgagtgcttcaagtagtgtgtgcccgtctgttgtgtgactctgatagctagagatcccttcagaccaaatttagtcagtgtgaaaa
atctctagcagtggcgcctgaacagggacttgaaagcgaaagagaaaccagagaagctctctcgacgcaggactcggcttgctgaagcgcgcacggcaagaggcgaggggacggcgactggtgagtacgccaaaattttgactagcggaggctagaaggagagagatgggtgc
gagagcgtcgatattaagcgggggaggattagatagatgggaaaaaattcggttaaggccagggggaaagaaaaaatatagattaaaacatttagtatgggcaagcagggagctagaacgattcgcagtcaatcctggcctattagaaacatcagaaggttgtagacaaatac
tgggacaactacaaccagcccttcagacaggatcagaagaacttagatcattatataatacagtagcaaccctctattgtgtgcatcaaaagatagatgtaaaagacaccaaggaagctttagataagatagaggaagagcaaaacaaaagtaagaaaaaagcacagcaagca
gcagctgacacaggaaatagcagccaggtcagccaaaattaccccatagtgcagaacatccaggggcaaatggtacatcaggccatatcacctagaactttaaatgcatgggtaaaagtagtagaagagaaggctttcagcccagaagtaatacccatgttttcagcattatc
agaaggagccaccccacaagatttaaacaccatgctaaacacagtggggggacatcaagcagccatgcaaatgttaaaagagaccatcaatgaggaagctgcagaatgggatagattgcatccagtgcatgcagggcctcatccaccaggccagatgagagaaccaaggggaa
gtgacatagcaggaactactagtacccttcaggaacaaatagcatggatgacaaataatccacctatcccagtaggagaaatctataagagatggataatcctgggattaaataaaatagtaaggatgtatagccctaccagcattctggacataaaacaaggaccaaaggaa
ccctttagagactatgtagaccggttctataagactctaagagccgagcaagcttcacaggaggtaaaaaattggatgacagaaaccttgttggtccaaaatgcgaacccagattgtaagactattttaaaagcattgggaccagcagctacactagaagaaatgatgacagc
atgtcagggagtgggaggacccggccataaagcaagagttttggcagaagcaatgagccaagtaacaaattcagctaccataatgatgcagaaaggcaattttaggaaccaaagaaaaattgttaagtgtttcaattgtggcaaagaagggcacatagccaaaaattgcaggg
cccctaggaaaaggggctgttggaaatgtggaaaggagggacaccaaatgaaagattgtactgagagacaggctaattttttagggaaaatctggccttcccacaggggaaggccagggaattttcctcagaacagactagagccaacagccccaccagccccaccagaagag
agcttcaggtttggggaagagacaacaactccctctcagaagcaggagctgatagacaaggaactgtatccttcagcttccctcaaatcactctttggcaacgaccccttgtcacaataaagataggggggcaactaaaggaagctctattagatacaggagcagatgataca
gtattagaagaaataaatttgccaggaagatggaaaccaaaaatgatagggggaattggaggttttatcaaagtaagacagtatgatcaaatactcgtagaaatctgtggacataaagctataggtacagtattagtaggacctacacctgtcaacataattggaagaaatct
gttgactcagattggttgcactttaaattttcccattagtcctattgaaactgtaccagtaaaattaaagccaggaatggatggcccaaaagttaaacaatggccattgacagaagaaaaaataaaagcattagtagaaatctgtacagaaatggaaaaggaaggaaaaattt
caaaaatcgggcctgaaaatccatataatactccagtatttgccataaagaaaaaagacagtactaaatggagaaaattagtagatttcagagaacttaataagaaaactcaagacttctgggaagttcaattaggaataccacatcccgcagggttaaaaaagaaaaaatca
gtaacagtactggatgtgggtgatgcatatttttcagttcccttagataaagaattcaggaagtacactgcatttaccatacctagtataaacaatgagacaccagggattagatatcagtacaatgtgcttccacagggatggaaaggatcaccagcaatattccaaagcag
catgacaaaaatcttagagccttttagaaaacaaaatccagacatagttatctatcaatacatggacgatttgtatgtaggatctgacttagaaatagggcagcatagaacaaaaatagaggaactgagacaacatctgttgaagtggggatttaccacaccagacaaaaaac
atcagaaagaacctccattcctttggatgggttatgaactccatcctgataaatggacagtacagcctatagtgctgccagaaaaggacagctggactgtcaatgacatacagaagttagtgggaaaattgaattgggcaagtcagatttacccagggattaaagtaaagcaa
ttatgtagactccttaggggaaccaaggcactaacagaagtaataccactaacaaaagaagcagagctagaactggcagaaaacagggaaattctaaaagaaccagtacatggagtgtattatgacccatcaaaagacttaatagcggaaatacagaagcaggggcaaggtca
atggacatatcaaatttatcaagagccatttaaaaatctgaaaacaggaaaatatgcaagaatgaggggtgcccacactaatgatgtaaaacaattaacagaggcagtgcaaaaaataaccacagaaagcatagtaatatggggaaagactcctaaatttaaactacccatac
aaaaagaaacatgggaaacatggtggacagagtattggcaagccacctggattcctgagtgggagtttgtcaatacccctcccttagtaaaattatggtaccagttagagaaagaacccataataggagcagaaactttctatgtagatggggcagctaacagggagactaaa
ttaggaaaagcaggatatgttactaacaaagggagacaaaaagttgtctccataactgacacaacaaatcagaagactgagttacaagcaattcttctagcattacaggattctggattagaagtaaacatagtaacagactcacaatatgcattaggaatcattcaagcaca
accagataaaagtgaatcagagatagtcagtcaaataatagagcagttaataaaaaaagaaaaggtctacctgacatgggtaccagcgcacaaaggaattggaggaaatgaacaagtagataaattagtcagtactggaatcaggaaagtactctttttagatggaatagata
aagcccaagaagaacatgaaaaatatcacagtaattggagggcaatggctagtgattttaacctgccacctgtggtagcaaaagagatagtagccagctgtgataaatgtcagctaaaaggagaagccatgcatggacaagtagactgtagtccaggaatatggcaactagat
tgtacacatttagaaggaaaaattatcctggtagcagttcatgtagccagtggatatatagaagcagaagttattccagcagaaacagggcaggaaacagcatactttctcttaaaattagcaggaagatggccagtaaaaacagtacatacagacaatggcagcaatttcac
cagtactacagttaaggccgcctgttggtgggcaggaatcaagcaggaatttggcattccctacaatccccaaagtcaaggagtagtagaatctataaataaagaattaaagaaagttataggacagataagagatcaggctgaacatcttaagacagcagtacaaatggcag
tattcatccacaattttaaaagaaaaggggggattggggggtacagtgcaggggaaagaatagtagacataatagcaacagacatacaaactaaagaactacaaaaacaaattacaaaaattcaaaattttcgggtttattacagggacagcagagatccactttggaaagga
ccagcaaagcttctctggaaaggtgaaggggcagtagtaatacaagataatagtgacataaaagtagtgccaagaagaaaagcaaagatcattagggattatggaaaacagatggcaggtgatgattgtgtggcaagtagacaggatgaggattagaacatggaaaagtttag
taaaacaccatatgtatgtttcaaggaaagctaagggatggttttatagacatcactatgaaagtactcatccgagaataagttcagaagtacacatcccactagggaatgcaaaattggtaataacaacatattggggtctacatacaggagaaagagactggcatttgggt
caaggagtctccatagaattgaggaaaaggagatatagcacacaattagaccctaacctagcagaccaactaattcatctgcattactttgattgtttttcagaatctgctataagaaatgccatattaggacatatagttagccctaggtgtgaatatcaagcaggacataa
caaggtaggatctctacagtacttggcactaacagcattagtaagaccaagaaaaaagataaagccacctttgcctagtgttacaaaactgacagaggatagatggaacaagccccagaagaccaagggccacaaagggaaccatacaatgaatggacactagaacttttaga
ggagctcaagaatgaagctgttagacattttcctaggatatggctccatagcttagggcaacatatctatgaaacttatggagatacttgggcaggagtggaagccataataagaattctgcaacaactgctgtttattcatttcagaattgggtgtcaacatagcagaatag
acattcttcgacgaaggagagcaagaaatggagccagtagatcctagactagagccctggaagcatccaggaagtcagcctaggactgcttgtaccaattgctattgtaaaaagtgttgctttcattgccaagtttgtttcataacaaaaggcttaggcatctcctatggcag
gaagaagcggagacagcgacgaagagctcctcaagacagtcagactcatcaagtttctctatcaaagcagtaagtagtacatgtaatgcaatctttacaaatattagcagtagtagcattagtagtagcagcaataatagcaatagttgtgtggtccatagtattcatagaat
ataggaaaataagaagacaaaacaaaatagaaaggttgattgatagaataatagaaagagcagaagacagtggcaatgagagtgacggagatcaggaagaattatcagcacttgtggaaatggggcacgatgctccttgggatgttaatgatctgtaaagctgcagaaaattt
gtgggtcacagtttattatggggtacctgtgtggaaagaagcaaccaccactctattttgtgcctcagatgctaaagcgtatgatacagaggtacataatgtttgggccacacatgcctgtgtacccacagaccccaacccacaagaagtagaactgaagaatgtgacagaaa
attttaacatgtggaaaaataacatggtagaccaaatgcatgaggatataattagtttatgggatcaaagcctaaagccatgtgtaaaattaaccccactctgtgttactttaaattgcactgattatgggaatgatactaacaccaataatagtagtgctactaaccccact
agtagtagcgggggaatggaggggagaggagaaataaaaaattgctctttcaatatcaccagaagcataagagataaagtgaagaaagaatatgcacttttttatagtcttgatgtaataccaataaaagatgataatactagctataggttgagaagttgtaacacctcagt
cattacacaggcctgtccaaaggtatcctttgaaccaattcccatacattattgtgccccggctggttttgcgattctaaagtgtaatgataaaaagttcaatggaaaaggaccatgtacaaatgtcagcacagtacaatgtacacatggaattaggccagtagtatcaactc
aactgctgttaaatggcagtctagcagaagaagaggtagtaattagatcagacaatttctcggacaatgctaaagtcataatagtacatctgaatgaatctgtagaaattaattgtacaagactcaacaacattacaaggagaagtatacatgtaggacatgtaggaccaggc
agagcaatttatacaacaggaataataggaaaaataagacaagcacattgtaacattagtagagcaaaatggaataacactttaaaacagatagttacaaaattaagagaacaatttaagaataaaacaatagtctttaatcaatcctcaggaggggacccagaaattgtaat
gcacagttttaattgtggaggggaatttttctactgtaattcaacacaactgtttaacagtacttggaatggtactgcatggtcaaataacactgaaggaaatgaaaatgacacaatcacactcccatgcagaataaaacaaattataaacatgtggcaggaagtaggaaaag
caatgtatgcacctcccatcagaggacaaattagatgttcatcaaatattacagggctgatattaacaagagatggtggtattaaccagaccaacaccaccgagattttcaggcctggaggaggagatatgaaggacaattggagaagtgaattatataaatataaagtagta
aaaattgaaccattaggagtagcacccaccaaggcaaagagaagagtggtgcaaagagaaaaaagagcagtgggaataataggagctatgctccttgggttcttgggagcagcaggaagcactatgggcgcagcgtcaatgacgctgacggtacaggccagacaattattgtc
tggtatagtgcaacagcagaacaatttgctgagggctattgaggcgcaacagcatctgttgcacctcacagtctggggcatcaagcagctccaagcaagagtcctggctgtggaaagatacctaagggatcaacagctcctggggttttggggttgctctggaaaactcattt
gcaccactgctgtgccttggaatactagttggagtaataaatctctgagtcagatttgggataacatgacctggatgcagtgggaaagggaaattgataattacacaagcttaatatacaacttaattgaagaatcgcaaaaccaacaagaaaagaatgaacaagagttattg
gaattagataactgggcaagtttgtggaattggtttagcataacaaattggctgtggtatataaaaatattcataatgatagtaggaggcttggtaggtttaagaatagtttttactgtactttctatagtaaatagagttaggcagggatactcaccattgtcgtttcagac
gcgcctcccagccaggaggggacccgacaggcccgaaggaatcgaagaagaaggtggagagagagacagagacagatccggtcaattagtggatggattcttagcaattatctgggtcgacctgcggagcctgtgcctcttcagctaccaccgcttgagagacttactcttga
ttgtaacgaggattgtggaacttctgggacgcagggggtgggaagccctcaaatattggtggaatctcctacaatattggattcaggaactaaagaatagtgctgttagcttgctcaacgccacagccatagcagtagctgagggaactgatagggttatagaagtattacaa
agagcttgtagagctattctccacatacctagaagaataagacagggcttagaaagggctttgcaataagatgggtggtaagtggtcaaaaagtagtaaaattggatggcctactgtaagggaaagaatgagaagagctgagccagcagcagatggggtgggagcagtatctc
gagacctggaaaaacatggagcaatcacaagtagtaatacagcaactaacaatgctgattgtgcctggctagaagcacaagaggaggaggaggtgggttttccagtcagacctcaggtacctttaagaccaatgacttacaagggagcgttagatcttagccactttttaaaa
gaaaaggggggactggaagggctaatttggtcccagaaaagacaagacatccttgatttgtgggtccaccacacacaaggctacttccctgattggcagaactacacaccagggccagggatcagatatccactgacctttggttggtgcttcaagctagtaccagttgagcc
agagaaggtagaagaggccaatgaaggagagaacaacagattgttacaccctgtgagcctgcatgggatggaggacccggagaaagaagtgttagtatggaggtttgacagccgcctagtactccgtcacatggcccgagagctgcatccggagtactacaaggactgctgac
actgagctttctacaagggactttccgctggggactttccagggaggcgtggcctgggcgggactggggagtggcgagccctcagatgctgcatataagcagctgctttttgcctgtactgggtctctcttgttagaccagatctgagcctgggagctctctggctaactagg
gaacccactgcttaagcctcaataaagcttgccttgagtgcttca
From gene to protein and its function(s)

Gene Function

> DNA sequence


AATTCATGAAAATCGTATACTGGTCTGGTACCGGCAACAC > Protein sequence
TGAGAAAATGGCAGAGCTCATCGCTAAAGGTATCATCGAA MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDV
TCTGGTAAAGACGTCAACACCATCAACGTGTCTGACGTTA NIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISG
ACATCGATGAACTGCTGAACGAAGATATCCTGATCCTGGG KKVALFGSYGWGDGKWMRDFEERMNGYGCVVVETP
TTGCTCTGCCATGGGCGATGAAGTTCTCGAGGAAAGCGAA LIVQNEPDEAEQDCIEFGKKIANI
TTTGAACCGTTCATCGAAGAGATCTCTACCAAAATCTCTG
GTAAGAAGGTTGCGCTGTTCGGTTCTTACGGTTGGGGCGA
CGGTAAGTGGATGCGTGACTTCGAAGAACGTATGAACGGC
TACGGTTGCGTTGTTGTTGAGACCCCGCTGATCGTTCAGA
ACGAGCCGGACGAAGCTGAGCAGGACTGCATCGAATTTGG
TAAGAAGATCGCGAACATCTAGTAGA
Goals of Functional Genomics

What is the function of these structures?

What is the function of this sequence?

What is the function of this motif?


– the fold provides a scaffold, which can be
decorated in different ways by different
sequences to confer different functions
– knowing the fold & function allows us to
rationalise how the structure effects its function
at the molecular level
Bioinformatics Application Levels

• Basic Level
• Organization of the collected data
• Maintenance: correction and update
• Types of data sets:
• Genome sequence
• Macromolecular structures
• Functional genomics experimental data
• Others
– Phylogenetic trees, metabolic pathways, scientific
literature etc.
• Very sophisticated databases are
needed
Protein Data Bank (PDB)
http://www.rcsb.org/pdb/
Protein Data Bank (PDB)
http://www.rcsb.org/pdb/
Molecule Type

Proteins Nucleic Protein/ Other Total


Acids NA
Comple
xes
Exp. X-ray 35091 973 1624 28 37716
Method NMR 5457 773 130 7 6367
Electron 101 10 38 0 149
Microsc
opy
Other 81 4 3 0 88
Total 40730 1760 1795 35 44320

(As of Tuesday Jun 26, 2007 )


SWISS-PROT/TrEMBL

• Collaboration between the SIB


(CH) and EMBL/EBI (UK)
• SWISS-PROT: Fully annotated
(manually), non-redundant,
cross-referenced, documented
protein sequence database
• TrEMBL: is automatically
generated (from annotated
EMBL coding sequences
(CDS)) and annotated using
software tools

http://ca.expasy.org/sprot/
SWISS-PROT/TrEMBL

10-Jul-2007 of UniProtKB/TrEMBL contains 4553922 sequence entries

http://ca.expasy.org/sprot/
NCBI Entrez Genome Projects

http://www.ncbi.nlm.nih.gov/entrez/
Bioinformatics Application Levels

• Second Level
• Development of tools and resources
• For analysis and interpretation of data
• More challenging task
• More important and interesting to biologist
• One important task is searching for similarity
BLAST: Sequence Similarity Searches
VAST: Structure Similarity Searches
Bioinformatics Application Levels

• Third Level
• Modeling and simulating different bio-modules
• Use system level analysis and interpretation
• Search and unravel the origin of life, rules of
evolution
• Use the acquired knowledge for treating and curing
disease, aging
Some Bioinformatics Applications

• Information Search and Retrieval


• One indispensable tool needed in
Bioinformatics
• Gigantic databases are being
piled up
• We need very expert search tools
– Example is PUBMED

http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed
Genetics Based Applications

• Three types of computation problems:


• Gene Annotation
– Identify the genes
– locate promoters, binding sites etc.
• Homology Detection
– assess similarity with known genes
• Genome-wide Analysis
– derive evolutionary relationship
– identify gene families
– determination of chromosomal location
Sequence Comparison

• One of the most useful application for


biologists
• Similarity search is helpful for
• homology detection
• distance measure
• evolutionary relationship detection
• Most popular tools are
• BLAST
• FASTA
Linkage Analysis

• Used to identify chromosomal location of


genes
• Involves the analysis of large amount of
data
• Has important implication in disease
identification
• Many programs are available
• http://linkage.rockefeller.edu
Phylogenetic Analysis

• Also known as molecular taxonomy


• Evolutionary relationship is presented in
the form of a tree
• One popular tool is PHYLIP
Rational Drug Design

• Understanding how structures bind other molecule (function)


• Designing inhibitors
• Docking, structure modeling
Drug Lead Screening & Docking

Complementarity
- Shape
- Chemical
- Electrostatic
Computer Aided Drug Design (CADD)

• Very recent emerging discipline


• Uses
– Bioinformatics Tools
– Chemoinformatics
– Combinatorial Chemistry
• Commercially very important
– Some tools are already avialble
Drug Development Life Cycle
Discovery
(2 to 10 Years)

Preclinical Testing
(Lab and Animal Testing)

Phase I
(20-30 Healthy Volunteers used to
With the aid of Bioinformatics check for safety and dosage)
Phase II
(100-300 Patient Volunteers used to
check for efficacy and side effects)
Phase III
(1000-5000 Patient Volunteers
used to monitor reactions to
long-term drug use)

FDA Review
& Approval

Post-Marketing
Years Testing

0 2 4 6 8 12 14 16
10

7 – 15 Years!
Drug lead screening

5,000 to 10,000 compounds screened

250 lead candidates in


Preclinical
5 drug candidates
Testing
enter Clinical Testing;
80% pass Phase I

30% pass Phase II

80% pass Phase III

One drug
approved
by the FDA
Systems Biology

• System-level identification of organism,


organelles
• How the systems works with its
constitutes?
• How outputs are generated from given
inputs?
• More concerned with modeling and
simulations
• Makes auspicious promises to disease
treatment and disease cure
Applications of Bioinformatics (Summary)

Search for new drugs OH NH2 OCH3


N
NH2 NH N CH2 OCH3

DNA chips
N CH2
Cl
NH2 NH2 OCH3
N N N
Cl
NH2
NH
NH Cl NH
H
C CH3 NH NH
NH N Cl
CH3 NH2 O H COO - H

Genetic Variations
C CH
OH NH N
3

N N N CH3

NH2 NH
CH3
NH2 N N
CH3 N
N H2 O C H3 N

N C H2 O C H3 NH COO-

N H2 N O C H3 O H COO-

Biochemical Networks

Optimizing therapies

data analysis,
algorithms,
visualization, statistics,
etc. caaaaatagggttaatatgaatctcgatctccattttgttcatcgtattcaacaacaagcc

Genomes aaaactcgtacaaatatgaccgcacttcgctataaagaacacggcttgtggcgagatatct
cttggaaaaactttcaagagcaactcaatcaactttctcgagcattgcttgctcacaatat
tgacgtacaagataaaatcgccatttttgcccataatatggaacgttgggttgttcatgaa
actttcggtatcaaagatggtttaatgaccactgttcacgcaacgactacaatcgttgaca
ttgcgaccttacaaattcgagcaatcacagtgcctatttacgcaaccaatacagcccagca
agcagaatttatcctaaatcacgccgatgtaaaaattctcttcgtcggcgatcaagagcaa
tacgatcaaacattggaaattgctcatcattgtccaaaattacaaaaaattgtagcaatga
aatccaccattcaattacaacaagatcctctttcttgcacttgg

Proteins
d1dhfa_ LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSS-
VNELGVKIMQGKKTWFSI d8dfr
LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSH-VNEAGVKIQMGKKTWFSI

Molecular d4dfra_ ISLIAALAVDRVIGMENAMPWN- LPADLAWFKRN-T--L-----


NKPVIMGRHTWESI d3dfr
V------
d1dhfa_ GKIMVVGRRTYESF
TAFLWAQDRDGLIGKDGHLPW- LHPDDLHYFRAQT--
LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSS-

Interactions
VNELGVKIMQGKKTWFSI d8dfr
LNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSH-VNEAGVKIQMGKKTWFSI
d4dfra_ ISLIAALAVDRVIGMENAMPW- NLPADLAWFKRNT-L--D-----

Structure Prediction KPVIMGRHTWESI d3dfr


G----- KIMVVGRRTYESF
TAFLWAQDRNGLIGKDGHLP- WHLPDDLHYFRAQT-V--

Sequence Analysis

You might also like