Professional Documents
Culture Documents
Bioinformatics Session5
Bioinformatics Session5
Bioinformatics Session5
Session 5
Example 1:
Rolling a dice twice: Probability that the first roll is any odd number (A) and the
second roll is number 4 (B)
P(A ⋂ B) = (3/6).(1/6) = 3/36
Example 2:
Roll a six-sided die and then flip a coin.
These two events are independent.
The probability of rolling a 1 is 1/6. The probability of a head is 1/2.
The probability of rolling a 1 and getting a head is 1/6 x 1/2 = 1/12.
Independent event: event that has no effect on the probability of another event occurring
The likelihood ratio
• The score should reflect the odds that the sequences (aligned) s and s
′ are evolutionary related.
• High score if the odds that the 2 sequences (aligned) are related.
Likelihood score to log-likelihood score
i.e, Sa,b < 0, when a and b are more likely to be aligned randomly compared to related
sequences @ 1 PAM distance
PAMn substitutions
• For sequences having an evolutionary distances of n PAM units
• nPAM units does not essentially mean n% divergence, because
substitutions can occur at the same site multiple times.
Sij refers to the score of amino acid i aligning with j and qij are the
positive target frequencies
PAM vs BLOSUM
The twilight zone (Doolittle, 1987) refers to the evolutionary distance corresponding to about 20%
identity between two proteins.
Proteins with this degree of amino acid sequence identity may be homologous, but such homology is
difficult to detect.
Global sequence alignment
Global Sequence Alignment: Needleman and
Wunsch Algorithm
• One of the first and most important algorithms for aligning two
protein sequences
• Important because it produces an optimal alignment of protein
or DNA sequences, even allowing the introduction of gaps.
• The result is optimal, and not all possible alignments need to
be evaluated.
• An exhaustive pairwise comparison would be too
computationally expensive to perform.
How many alignments needs to be compared
with a simple search?
• For 2 sequences of length n, the number of possible global
alignments are