Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Longest Common Subsequence

Longest Common Subsequence

+ If S1 and S2 are the two given sequences then, Z is the


common subsequence of S1 and S2 if Z is a subsequence of
both S1 and S2. Furthermore, Z must be a strictly increasing
sequence of the indices of both S1 and S2.
+ In a strictly increasing sequence, the indices of the elements
chosen from the original sequences must be in ascending order
in Z.
+ S1 = {B, C, D, A, A, C, D}
+ Then, {A, D, B} cannot be a subsequence of S1 as the
order of the elements is not the same (ie. not strictly
increasing sequence).
Example

+ S1 = {B, C, D, A, A, C, D}
+ S2 = {A, C, D, B, A, C}
+ Then, common subsequences are
+ {B, C}, {C, D, A, C}, {D, A, C}, {A, A, C}, {A, C}, {C, D}, ...
+ Among these subsequences, {C, D, A, C} is the longest
common subsequence.
A b d \0

LCS through Recursive call


Ind 0 1 2
A[0], B[0]
B a b c d \0
b a
Ind 0 1 2 3 4
int LCS(i,j)
A[1], B[0] A[0], B[1]
{ d a b b

1. if(A[i]==\0 or B[j]==\0)
2. return 0 A[2], B[0] A[1], B[1] A[1], B[2]
\0 a d b d c
3. else if (A[i]==B[j])
4. return 1+ LCS(i+1,j+1) A[2], B[1] A[1], B[2] A[2], B[2] A[1], B[3]
\0 b d c \0 c d d
5. else
6. return max(LCS(i+1,j), LCS(i,j+1)) A[2], B[2] A[1], B[3] A[2], B[4]
\0 c d d \0 \0
}

A[2], B[4]
\0 \0
LCS through M emoization (Dynamic programming)
A[0], B[0]
2
b a

A→ b d \0
0 1 2 A[1], B[0] A[0], B[1]
1 2
d a b b

B a b c d \0

A[2], B[0] 0 A[1], B[1] A[1], B[2] 1+1=2
1 1+
0 1 2 3 4 \0 a d b d c

A[2], B[1] A[1], B[2]


0 1
B a b c d \0 \0 b d c


A↓ 0 1 2 3 4 A[2], B[2]
0
A[1], B[3] 1
\0 c d d
b 0 2 2
d 1 1 1 1 1 1+
A[2], B[4]
0+1=1
\0 \0
\0 2 0 0 0 0 0
X and Y be two given sequences
Initialize a table LCS of dimension X.length * Y.length
X.label = X
Y.label = Y
LCS[0][] = 0

LCS[][0] = 0
Start from LCS[1][1]
LCS Algorithm through C ompare X[i] and Y[j]

Dynamic Programming If X[i] = Y[j]


LCS[i][j] = 1 + LCS[i-1, j-1]
Point an arrow to LCS[i][j]
Else

LCS[i][j] = max(LCS[i-1][j], LCS[i][j-1])


Point an arrow to max(LCS[i-1][j], LCS[i][j-1])
LCS Example
We’ll see how LCS algorithm works on the following
example:
• X = ABCB
• Y = BDCAB

What is the Longest Common Subsequence


of X and Y?

LCS(X, Y) = BCB
X = A B C B
Y= B D C AB
LCS Example (0)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi

A
1
ABCB B
2
BDCAB
3 C
4 B

X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4]
LCS Example (1)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A 0
X=ABCB 1
Y=BDCAB 2 B 0
3 C 0
4 B 0

LCS[0][] = 0
LCS[][0] = 0
LCS Example (2)
ABCB j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0
2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (3) ABCB
j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A 0 0 0 0
1
2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (4)
ABCB j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A 0 0 0 0 1
1
2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (5)
ABCB j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A 0 0 0 0 1 1
1
2 B 0
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (6)
ABCB j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A 0 0 0 0 1 1
1
2 B 0 1
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (7) ABCB
j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (8)
ABCB j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A 0 0 0 0 1 1
1
2 B 0 1 1 1 1 2
3 C 0
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (10) ABCB
j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (11)
ABCB j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A 0 0 0 0 1 1
1
2 B 0 1 1 1 1 2
3 C 0 1 1 2
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (12) ABCB
j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (13)
ABCB j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A 0 0 0 0 1 1
1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (14) ABCB
j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Example (15) ABCB
j 0 1 2 3 4 5 BDCAB
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )
LCS Algorithm Running Time

• LCS algorithm calculates the values of each entry of the array c[m,n]
• So what is the running time?

O(m*n)
since each c[i,j] is calculated in
constant time, and there are m*n
elements in the array
How to find actual LCS?

c[i 1, j 1] 1 if x[i]  y[ j],


c[i, j]  
 max(c[i, j 1],c[i 1, j]) otherwise
 So we can start from c[m,n] and go backwards Whenever c[i,j] =
c[i-1, j-1]+1, remember x[i]
 (because x[i] is a part of LCS)
 When i=0 or j=0 (i.e. we reached the beginning), output
remembered letters in reverse order
Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B
0 Xi 0 0 0 0 0 0
A
1 0 0 0 0 1 1
2 B 0 1 1 1 1 2
3 C 0 1 1 2 2 2
4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
(this string turned out to be a palindrome)
• X= LONGEST
• Y=STONE
Find the solution
Longest Common Subsequence Applications

1. in compressing genome resequencing data


2. to authenticate users within their mobile phone through in-air
signatures

You might also like