Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

CS-434

BIOINFORMATICS
DR. UROOJ AINUDDIN
PAIRWISE SEQUENCE
ALIGNMENT
CHAPTER 3
Alignment algorithms

• Alignment algorithms, both global and local, are fundamentally


similar and only differ in the optimization strategy used in aligning
similar residues.
• Both types of algorithms can be based on one of the three
methods:
1. The dot matrix method,
2. The dynamic programming method, and
3. The word method.
Dynamic programming method
• Dynamic Programming (DP) means that the scores of the subsequent cells can
only be determined if the initial cells (towards the top left) have been scored.
• The DP methodology requires computation of the initial values (at the left and
top side of the matrix) to obtain the later values of the cells (located towards
the right and bottom side) in the matrix.
• It is used for both local and global alignment.
• We focus on two algorithms in this class:
1. Needleman-Wunch algorithm
2. Smith–Waterman algorithm
Needleman-Wunch algorithm

• It is used for global alignment.


• The best alignment can be identified by quantifying or scoring the
possible alternative alignments.
• Scoring matrices are used to award the matches and penalize the
mismatches and gaps, so that the best alignment with the highest
score can be identified.
• The scores in the matrix are integer values (e.g., +1, 0, –1).
j 0 1 2 3 4 5 6 7 8 9

Example i C A G G T A G T G

0 0
X: CAGGTAGTG 9
Y: CTAGTAG 7 1 C
• Define a scoring scheme.
• 𝑀𝑎𝑡𝑐ℎ = 2 2 T
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
3 A T(i-1,j-1)
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• 𝑇 0,0 = 0 4 G T(i,j)

𝑇 𝑖 − 1, 𝑗 − 1 + 𝑚𝑎𝑡𝑐ℎ 𝑜𝑟 𝑚𝑖𝑠𝑚𝑎𝑡𝑐ℎ 5 T
• 𝑇 𝑖, j = max ൞ 𝑇 𝑖 − 1, 𝑗 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝑇 𝑖, 𝑗 − 1 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 6 A

7 G
j 0 1 2 3 4 5 6 7 8 9

Example i C A G G T A G T G

0 0
X: CAGGTAGTG 9
Y: CTAGTAG 7 1 C
• Define a scoring scheme.
• 𝑀𝑎𝑡𝑐ℎ = 2 2 T
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
3 A T(i-1,j)
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• 𝑇 0,0 = 0 4 G T(i,j)

𝑇 𝑖 − 1, 𝑗 − 1 + 𝑚𝑎𝑡𝑐ℎ 𝑜𝑟 𝑚𝑖𝑠𝑚𝑎𝑡𝑐ℎ 5 T
• 𝑇 𝑖, j = max ൞ 𝑇 𝑖 − 1, 𝑗 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝑇 𝑖, 𝑗 − 1 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 6 A

7 G
j 0 1 2 3 4 5 6 7 8 9

Example i C A G G T A G T G

0 0
X: CAGGTAGTG 9
Y: CTAGTAG 7 1 C
• Define a scoring scheme.
• 𝑀𝑎𝑡𝑐ℎ = 2 2 T
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
3 A T(i,j-1)
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• 𝑇 0,0 = 0 4 G T(i,j)

𝑇 𝑖 − 1, 𝑗 − 1 + 𝑚𝑎𝑡𝑐ℎ 𝑜𝑟 𝑚𝑖𝑠𝑚𝑎𝑡𝑐ℎ 5 T
• 𝑇 𝑖, j = max ൞ 𝑇 𝑖 − 1, 𝑗 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝑇 𝑖, 𝑗 − 1 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 6 A

7 G
j 0 1 2 3 4 5 6 7 8 9

Example i C A G G T A G T G

0 0
X: CAGGTAGTG 9
Y: CTAGTAG 7 1 C Add gap penalty when
• Define a scoring scheme. moving in horizontal or
• 𝑀𝑎𝑡𝑐ℎ = 2 vertical direction.
2 T
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
3 A
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2 Add match or mismatch
score when moving in the
• 𝑇 0,0 = 0 4 G
diagonal direction.

𝑇 𝑖 − 1, 𝑗 − 1 + 𝑚𝑎𝑡𝑐ℎ 𝑜𝑟 𝑚𝑖𝑠𝑚𝑎𝑡𝑐ℎ 5 T
• 𝑇 𝑖, j = max ൞ 𝑇 𝑖 − 1, 𝑗 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝑇 𝑖, 𝑗 − 1 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 6 A

7 G
j 0 1 2 3 4 5 6 7 8 9

Example i C A G G T A G T G

0 0 -2 -4 -6 -8 -10 -12 -14 -16 -18


X: CAGGTAGTG 9
Y: CTAGTAG 7 1 C -2 2 0 -2 -4 -6 -8 -10 -12 -14
• Define a scoring scheme.
• 𝑀𝑎𝑡𝑐ℎ = 2 2 T -4 0 1 -1 -3 -2 -4 -6 -8 -10
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
3 A -6 -2 2 0 -2 -4 0 -2 -4 -6
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• 𝑇 0,0 = 0 4 G -8 -4 0 4 2 0 -2 2 0 -2

𝑇 𝑖 − 1, 𝑗 − 1 + 𝑚𝑎𝑡𝑐ℎ 𝑜𝑟 𝑚𝑖𝑠𝑚𝑎𝑡𝑐ℎ 5 T -10 -6 -2 2 3 4 2 0 4 2


• 𝑇 𝑖, j = max ൞ 𝑇 𝑖 − 1, 𝑗 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦
𝑇 𝑖, 𝑗 − 1 + 𝑔𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 6 A -12 -8 -4 0 1 2 6 4 2 3

7 G -14 -10 -6 -2 2 0 4 8 6 4
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• Tracing back is done from the
bottom right cell towards the
top left cell.
• The path is determined based
on the source of the present cell
which contributes the highest
score to it.
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• The score for each of each alignment is
calculated according to the scoring
scheme set at the beginning.
• The alignment score with the highest
value is considered as the best global
alignment.
• One can get more than one alignments
with the highest score, all of them being
equally good, and any one of them can
be accepted as the global alignment.
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• For every horizontal movement,
introduce a gap in the vertical sequence,
and apply the gap penalty.
• For every vertical movement, introduce a
gap in the horizontal sequence, and
apply the gap penalty.
• For every diagonal movement, apply
either match or mismatch score.
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• 𝑀𝑎𝑡𝑐ℎ = 2
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• Alignment# 1
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• 𝑀𝑎𝑡𝑐ℎ = 2
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• Alignment# 2
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• 𝑀𝑎𝑡𝑐ℎ = 2
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• Alignment# 3
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• 𝑀𝑎𝑡𝑐ℎ = 2
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• Alignment# 4
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• 𝑀𝑎𝑡𝑐ℎ = 2
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• Alignment# 5
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• 𝑀𝑎𝑡𝑐ℎ = 2
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• Alignment# 6
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• 𝑀𝑎𝑡𝑐ℎ = 2
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 2
• Alignment# 7
Example
X: CAGGTAGTG 9
Y: CTAGTAG 7
• We get seven possible global
alignments, all of which have the
highest score (i.e., equal to 4).
• We can select any one of these
alignments as the best alignment.
Homework

• 𝑀𝑎𝑡𝑐ℎ = 1
• 𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ = – 1
• 𝐺𝑎𝑝 𝑝𝑒𝑛𝑎𝑙𝑡𝑦 = – 3
• Using NWA, globally align these sequences:

X: ACTGTGCGT 9 A: GACCGTATTCGAGT 14
Y: GACGCGTG 8 B: GTCACACATGT 11

You might also like