Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

Bioinformatics

(BIO213)

Session 1
Slide content: Various textbooks, Internet sources
Introduction to the course:
• Me: Krishna Swamy
• TF: Aishwarya Joshi
• Contact: aishwarya.j@ahduni.edu.in
Logistics:
• Scribing: 20%
• Psets and Presentations: 30% (20% + 10%)
• Tests and Exams: 50%
Groups and Scribing
• Class is divided into 8 groups of 7 students
• Grouping is done to
• encourage discussion
• help each other in solving problems
• Others
• Scribing: Taking detailed notes in the class, such that it can be used as
reading material.
• Each group will scribe a randomly allocated session in permutation
• Some groups might have to scribe more than one session …
• Scribed notes will be distributed among all the students.
• Questions regarding the notes from a session will be addressed by the group
that made the notes.
Psets and presentations:
• Psets will be given to you at periodic intervals with a deadline.
• Submissions will at group level.
• Deadlines are written in stone, if you miss the deadline your
group loses points.

• Group members taking advantage of the other members and


not contributing to Psets/Presentations will be penalized.
• Complaints should be in writing though.
What is this course about?
• Basics of bioinformatics underlining the concepts with a sprinkle
of programming and algorithms
• At the end of the course, you will be familiar with:
• Bioinformatics of nucleotide and protein sequences
• Sequence alignments and their statistics (LSA, BLAST, MSA etc.)
• Some frequently used databases
• Molecular phylogeny
• Construction of phylogenetic trees
• Phylogenetic analysis
• Bioinformatics of protein structure and their applications
• Protein structure prediction
• Alignment
• PPI’s
What is this course not about?
• Running tools or software’s (to a large extent): You don’t need a
course for that.
• Building databases: too specific for a 1st level course and needs a lot
of programming
• Elaborate knowledge of databases: Again, each database has a
purpose.
Reference books:
• Understanding Bioinformatics by Marketa Zavelebil & Jeremy Baum
• Handouts for the course are from this book
• I will be referring to several other books and sources.
• The Scribed notes should come handy for your exams.
Questions?
Applications of Bioinformatics!
Methodological principles for ancestral genome reconstruction

Genome Biol 20, 29 (2019)


Bioinformatics has been an integral part of
disease biology: Cancer, Alzheimer's …..
Repurposing bioinformatics tools to
tackle the pandemic
• Viruses: They are so good at being bad
• Coronaviruses are a group of relatively large RNA viruses that infect
birds and mammals, including humans
• Generally, have no measurable effect or cause only mild symptoms,
such as those of a common cold.
• Bioinformatic analysis of SARS-CoV-2 data has accelerated the
research on these viruses and led to potential solution:
• Detecting and tracking of viral variant diversity (NGS, Mapping and
comparative genomics)
• Designing antibodies/vaccines (Protein bioinformatics, Molecular dynamics
and drug discovery)
Schematic representation of the diversity of spike
protein variants circulating in India

Using maximum likelihood-based phylogeny reconstruction.


Node = specific spike protein variant.
Red or green arrow = higher or lower stability index respectively of the variant compared to its ancestor it emerged from.
The black arrows = variants for which stability could not be determined due to non-existing isolate in the database.
Hypothetical node with H1083Q mutation gives rise to two derived variants, H1083Q:R78M and H1083Q:E583D
Why should anyone be
interested in Bioinformatics?

• Resurrection of dinosaurs using the blood from


the stomachs of insects encased in tree sap, later
turned into the mineral, amber.
• In the book, Dr. Henry Wu explains that DNA
techniques were used in reconstructing the
extinct dinosaur genomes
• How the fragmented pieces of dino-DNA from
the blood were assembled together.
• Since they don't have the entire genome but
that they "fill in the gaps" with modern day
frog.
• Application of Bioinformatics
Here is the Dino DNA from the book Jurassic park.
At one point during his discussion he points to a computer screen and remarks "Here you
see the actual structure of a small fragment of dinosaur DNA."
>JurassicPark DinoDNA from the book Jurassic Park
gcgttgctgg cgtttttcca taggctccgc ccccctgacg agcatcacaa aaatcgacgc
ggtggcgaaa cccgacagga ctataaagat accaggcgtt tccccctgga agctccctcg
tgttccgacc ctgccgctta ccggatacct gtccgccttt ctcccttcgg gaagcgtggc tgctcacgct
gtaggtatct cagttcggtg taggtcgttc gctccaagct gggctgtgtg ccgttcagcc cgaccgctgc
gccttatccg gtaactatcg tcttgagtcc aacccggtaa agtaggacag gtgccggcag
cgctctgggt cattttcggc gaggaccgct ttcgctggag atcggcctgt cgcttgcggt attcggaatc
ttgcacgccc tcgctcaagc cttcgtcact ccaaacgttt cggcgagaag caggccatta
tcgccggcat ggcggccgac gcgctgggct ggcgttcgcg acgcgaggct ggatggcctt
ccccattatg attcttctcg cttccggcgg cccgcgttgc aggccatgct gtccaggcag gtagatgacg
accatcaggg acagcttcaa cggctcttac cagcctaact tcgatcactg gaccgctgat
cgtcacggcg atttatgccg caagtcagag gtggcgaaac ccgacaagga ctataaagat
accaggcgtt tcccctggaa gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc
tcccttcggg ctttctcatt gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg
acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt cttgagtcca
acacgactta acgggttggc atggattgta ggcgccgccc tataccttgt ctgcctcccc
gcggtgcatg gagccgggcc acctcgacct gaatggaagc cggcggcacc tcgctaacgg
ccaagaattg gagccaatca attcttgcgg agaactgtga atgcgcaaac caacccttgg
ccatcgcgtc cgccatctcc agcagccgca cgcggcgcat ctcgggcagc gttgggtcct
gcgcatgatc gtgctagcct gtcgttgagg acccggctag gctggcgggg ttgccttact
atgaatcacc gatacgcgag cgaacgtgaa gcgactgctg ctgcaaaacg tctgcgacct
atgaatggtc ttcggtttcc gtgtttcgta aagtctggaa acgcggaagt cagcgccctg
Boguski probes into the published DNA fragment
• In 1992, a young scientist, Dr. Mark Boguski, at the NIH, having read the book
"Jurrasic Park"entered this sequence into a text editor and searched all of the
known DNA sequences at the time.
• Mark wrote up his findings and submitted a manuscript to the journal
BioTechniques, as a tongue-in-cheek joke.
• His manuscript was accepted and published. (Boguski, M.S. A Molecular
Biologist Visits Jurassic Park. (1992) BioTechniques 12(5):668-669).

Reading Assignment: Read Duncan et al, 2016 and follow the steps described in the paper and submit a report
"The Lost World" Dino-DNA Analysis
• Mark's published article was brought to Micheal Crichton's attention.
• In his second book, "The Lost World", Dr. Crichton used Mark as a
consultant.
• Mark chose a DNA sequence from a living organism which is much
more closely related to the dinosaurs.
• Mark also mixed in some frog, Xenopus, DNA just like Dr. Wu
described to fill in the holes in their dino-genomes.
• However, Mark played a little trick on Mr. Crichton by embeding a
message in the protein translation of the DNA sequence which he
submitted for use in the book.
Here is the sequence Mark gave Micheal Crichton for the book "The Lost World":
>LostWorld DinoDNA from the book The Lost World
gaattccgga agcgagcaag agataagtcc tggcatcaga tacagttgga gataaggacg gacgtgtggc agctcccgca gaggattcac tggaagtgca
ttacctatcc catgggagcc atggagttcg tggcgctggg ggggccggat gcgggctccc ccactccgtt ccctgatgaa gccggagcct tcctggggct gggggggggc
gagaggacgg aggcgggggg gctgctggcc tcctaccccc cctcaggccg cgtgtccctg gtgccgtggg cagacacggg tactttgggg accccccagt
gggtgccgcc cgccacccaa atggagcccc cccactacct ggagctgctg caaccccccc ggggcagccc cccccatccc tcctccgggc ccctactgcc
actcagcagc gggcccccac cctgcgaggc ccgtgagtgc gtcatggcca ggaagaactg cggagcgacg gcaacgccgc tgtggcgccg ggacggcacc
gggcattacc tgtgcaactg ggcctcagcc tgcgggctct accaccgcct caacggccag aaccgcccgc tcatccgccc caaaaagcgc ctgcgggtga
gtaagcgcgc aggcacagtg tgcagccacg agcgtgaaaa ctgccagaca tccaccacca ctctgtggcg tcgcagcccc atgggggacc ccgtctgcaa
caacattcac gcctgcggcc tctactacaa actgcaccaa gtgaaccgcc ccctcacgat gcgcaaagac ggaatccaaa cccgaaaccg caaagtttcc
tccaagggta aaaagcggcg ccccccgggg gggggaaacc cctccgccac cgcgggaggg ggcgctccta tggggggagg gggggacccc tctatgcccc
ccccgccgcc ccccccggcc gccgcccccc ctcaaagcga cgctctgtac gctctcggcc ccgtggtcct ttcgggccat tttctgccct ttggaaactc cggagggttt
tttggggggg gggcgggggg ttacacggcc cccccggggc tgagcccgca gatttaaata ataactctga cgtgggcaag tgggccttgc tgagaagaca
gtgtaacata ataatttgca cctcggcaat tgcagagggt cgatctccac tttggacaca acagggctac tcggtaggac cagataagca ctttgctccc tggactgaaa
aagaaaggat ttatctgttt gcttcttgct gacaaatccc tgtgaaaggt aaaagtcgga cacagcaatc gattatttct cgcctgtgtg aaattactgt gaatattgta
aatatatata tatatatata tatatctgta tagaacagcc tcggaggcgg catggaccca gcgtagatca tgctggattt gtactgccgg aattc
Assignment on “Lost World” Dino DNA
• Select, copy, and paste the "Lost World" sequence again into the web form: 
Translating BLAST Search.
• This type of search 'translates' the DNA sequence to six protein sequences and searches the
protein database.
• This search takes longer but is much informative about the relationship between the probe
DNA sequence and the hits in the database.
• Proteins use 20 letters instead of 4, this made it easier for Mark to create a hidden message.
• When the analysis is finished look at the best pairwise alignment by clicking on the score
value in the right-hand column or scroll down past the hit list to the first alignment -- Can you
find Mark's hidden message?
Prerequisite for this course:
The Central Dogma of Biology
The central dogma of Biology: A crash course
Deoxyribonucleic acid (DNA) and Ribonucleic acid (RNA)

• DNA Double helix structure,


• Base-pairing: Nitrogen bases, bonds, etc …
microbenotes.com
Transcription

• DNA Directed RNA polymerases are of 4 types: RNA Pol I, II, III and IV
• What are control elements?
Transcription vs Replication
Splicing

Gehring and Roignant, 2020, Trends in Genetics


Translation
Translation
Translation
Codons and Amino acid
Levels of protein structure
Dihedral angles and Ramachandran plot
The relative probabilities of amino
acids in common secondary structures

G N Ramachandran
Visualization of aspects of protein structure
Mind map of key factors in central dogma of
biology
Next session:
• Crash course on the central dogma of Biology
• Sequence alignment:
• Principle of alignment
• Gap penalties and scoring schemes

Reading Assignment: Read Duncan et al, 2016 and follow the steps described in the paper.
The procedure for BLAST analysis for Dino-DNA from “Lost world” is also given in Duncan et al, 2016.
Submit a report on the exercises listed in the paper and guess Mark Boguski’s message.
Assignment submission is at group level
Deadline: Before August 27, 2021
Thank You

You might also like