Professional Documents
Culture Documents
4.1. Pairwise Alignment - 2
4.1. Pairwise Alignment - 2
4.1. Pairwise Alignment - 2
Alignment
Sequence alignment is a way of comparing two primary sequences of DNA,
RNA, or protein
In principle:
two sequences are written out, one on top of the other
gacctaatcgtgaccatttgcgcgcttaaaatccgtta
attgacctaaatcgtgaccatgcgcgcttaaaatccgttaaaaa
then one sequence is moved with respect to the other, and gaps are inserted in each
sequence to maximize identical pairs of bases (or similar pairs of amino acids) lining up
on the top of each other.
gaccta-atcgtgaccatttgcgcgcttaaaatccgtta
attgacctaaatcgtgaccat--gcgcgcttaaaatccgttaaaaa
gap
Alignment can be done by hand using any word processor or text editor to
move lines of text and insert spaces.
It is a relatively straightforward computational problem.
Examples:
identical sequences ACACACTA
ACACACTA
A C A C A C T A
A
C
A
C
A
C
T
A
A C A C A C T A
A
G
C A gap has to be inserted
A
C
A
C
A
7 identities, 5 identities,
2 gaps 2 gaps,
2 mismatches
alignments: X Y
ATGC-GTCGTT ATGCGTCGTT
AT-CCG-CGAT ATCCG-CGAT
7 identities, 7 identities,
3 gaps 1 gap
1 mismatch 2 mismatches
„Evaluation” of alignments
The overall quality of the alignment is evaluated based on formulas that count the number
of identical (or similar) pairs, mismatches and gaps.
- the penalty assigned to a gap is (usually) not proportional to the size of the gap
because, from biological perspective, larger insertions/deletions (indels) are almost as
common as single-base indels.
D = Min(w1β + w2γ)
Kmax
D = Min(nd + Σw n ) k k
i=1
nd: the number of different (mismatched) elements
wk: the penalty for a gap of k nucleotides
nk: the number of gaps with length k
kmax: the maximum gap length allowed
The distance method is a protocol for finding the alignment with the smallest
D
Kmax
S = Max(nm - Σw n ) k k
i=1
nm: the number of identical(matched) elements
The similarity method is a protocol of finding an alignment with the
maximum S
Introduction to Bioinformatics
Matthias Sipiczki, Department of Genetics, University of Debrecen
Comments to: lipovy@tigris.unideb.hu