Professional Documents
Culture Documents
Theory I Algorithm Design and Analysis: (13 - Edit Distance and Approximate String Matching)
Theory I Algorithm Design and Analysis: (13 - Edit Distance and Approximate String Matching)
n
n
f
t
- e r
- o r
p o l
m
-
a
a
t
t
i
i
k o n
Edit distance
Given: two strings A = a1a2 .... am and B = b1b2 ... bn
Wanted: minimal cost D(A,B) for a sequence of edit operations
to transform A into B.
Edit operations:
1. Replace one character in A by a character from B
2. Delete one character from A
3. Insert one character from B
Edit distance
Cost model:
1
c ( a, b)
0
if a b
if a b
a , b possible
We assume the triangle inequality holds for c:
c(a,c) c(a,b) + c(b,c)
Each character is changed at most once
Edit distance
Trace as representation of edit sequences
A=
b a a c a a b c
B= a b a c b c a c
or using indels
A= - b a a c a - a b c
B= a b a - c b c a - c
Edit distance (cost): 5
Division of an optimal trace results in two optimal sub-traces
dynamic programming can be used
5
A
B
SUBESTRUCTURA OPTIMA
Sea Au = a1...au u<i y Bt = b1....bt t<j subcadenas de A y B
respectivamente.
Sea la mejor solucion D(Au , Bt) = D(Au-1 , Bt-1)+add(Au , Bt)
De
existir
SOLUCION OPTIMA
Se
PROBLEMAS REPETIDOS
4,4
3,4
2,4
2,3
2,3
.
.
.
4,3
3,3
3,3
2,3
2,2
3,2
1,2
1,1
2,1
3,3
3,2
4,2
3,2
Dm ,n
Dm1,n1
min Dm1,n
D
m ,n 1
c(am , bn ),
1,
Di-1,j
+d
Di,j-1
+1
+1
Di,j
10
Recurrence equation:
Di 1, j 1
Di , j min Di 1, j
D
i , j 1
c(ai , b j )
1,
11
Example
0
b
12