Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Design and Analysis of Algorithms

String Matching Problem


(Knuth-Morris-Pratt Algorithm)

Dr. D. P. Acharjya
Professor, SCOPE
SJT Annex – 201 E
2/28/2024 Dr. D. P. Acharjya 1
String Matching Problem
 Finding all occurrences of a pattern (String) in a text
leads to string matching problem.
 It arises frequently in text editing programs where text
is a document, and the string searched for is a
particular word.
 String-matching algorithms are also used to search for
particular patterns in DNA sequences.
 For Example:

2/28/2024 Dr. D. P. Acharjya 2


Mathematical Formulation
 Assume that the text is an array T[1  n] of length n.
The pattern is an array P[1  m] of length m. It is to be
noted that m ≤ n.

 It is to be noted that the elements of P and T are


characters drawn from a finite alphabet Σ. For example,
Σ = {0, 1} or Σ = {a, b, , z}.

 The character arrays P and T are often called strings of


characters.
2/28/2024 Dr. D. P. Acharjya 3
Continued…
 The pattern P occurs with shift s in text T (or pattern P
occurs beginning at position s + 1 in text T) if
0 ≤ s ≤ n - m and T[s+1  s+m] = P[1  m]
 It means that T[s + j] = P[j], for 1 ≤ j ≤ m.
 If P occurs with shift s in T, then it is a valid shift
otherwise, s is an invalid shift.
 The string-matching problem is the problem of finding
all valid shifts with which a given pattern P occurs in
a given text T.

2/28/2024 Dr. D. P. Acharjya 4


Prefix Function
 The prefix function π for a pattern encapsulates
knowledge about how the pattern matches against
shifts of itself.
 This information can be used to avoid testing useless
shifts in the naïve pattern-matching algorithm.
 Simultaneously, it avoid the precomputation of δ
(transition function of finite automaton) for a string-
matching automaton.
 It minimizes the total computation in comparison with
finite automaton string matching.

2/28/2024 Dr. D. P. Acharjya 5


Knuth-Morris-Pratt Algorithm
KMP-MATCHER(T, P)
1. n ← Length[T]
2. m ← Length[P]
3. π ← COMPUTE PREFIX FUNCTION(P)
4. q ← 0
5. for i ← 1 to n
6. do while q > 0 and P[q + 1] ≠ T[i]
7. do q ← π[q]
8. if P[q + 1] = T[i]
9. then q ← q + 1
10. if q = m
11. then print “Pattern occurs with shift” (i – m)
12. q ← π[q]
2/28/2024 Dr. D. P. Acharjya 6
Compute Prefix Function
COMPUTE-PREFIX-FUNCTION(P)
1. m ← Length[P]
2. π[1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P[k + 1] ≠ P[q]
6. do k ← π[k]
7. if P[k +1] = P[q]
8. then k ← k+1
9. π[q] ← k
10. Return π
2/28/2024 Dr. D. P. Acharjya 7
Computing Time
 The for loop at step 4 of compute prefix function is
computed (m-1) times. We associate a potential of k
with the current state k of the algorithm. This potential
has an initial value of 0.
 The only other line that affects k is line 8, which
increases k by at most one during each execution of
the for loop body. Since the number of outer-loop
iterations is (m - 1), the computing time for compute
prefix function is Θ(m).
 The total actual worst-case running time is Θ(m). It is
considered as pre-processing time.
2/28/2024 Dr. D. P. Acharjya 8
Continued…
 The for loop at step 5 of KMP algorithm is computed
n - times. We associate a potential of q with the
current state q of the algorithm. This potential has an
initial value of 0.
 The line that affects q is line 9, which increases q by at
most one during each execution of the for loop body.
 Since the number of outer-loop iterations is n, the
computing time for KMP algorithm is Θ(n).
 The total computing time for the algorithm is: Θ(m) +
Θ(n).
 The algorithm computes the match in linear time.
2/28/2024 Dr. D. P. Acharjya 9
Numerical Illustration
 Consider a text T =
bacbabababacaca, find all the
valid shifts for the pattern P =
ababaca.
 Compute Prefix Function
 m = Length[P] = 7
  (1) = 0
 k=0
 q=2
 k > 0 and P[1]P[2] (False)
2/28/2024 Dr. D. P. Acharjya 10
Continued …
 q=2
 k > 0 and P[1]P[2] (False)
 P[1] = P[2] (False)
 [2] = k = 0
 q=3
 k > 0 and P[1]P[3] (False)
 P[1] = P[3] (True)
 k=k+1=1
 [3] = 1

2/28/2024 Dr. D. P. Acharjya 11


Continued …
 q=4
 k=1>0 and P[2]P[4] (False)
 P[k+1=2] = P[4] (True)
 k=k+1=2
 [4] = 2
 q=5
 k=2>0 and P[3]P[5] (False)
 P[k+1=3] = P[5] (True)
 k=k+1=3
 [5] = 3
2/28/2024 Dr. D. P. Acharjya 12
Continued …
 q=6
 k=3>0 and P[4]P[6] (True)
 k = [3] = 1
 k=1>0 and P[2]P[6] (True)
 k = [1] = 0
 k=0>0 and P[1]P[6] (False)
 P[k+1=1] = P[6] (False)
 [6] = 0 (k)

2/28/2024 Dr. D. P. Acharjya 13


Continued …
 q=7
 k=0>0 and P[1]P[7] (False)
 P[k+1=1] = P[7] (True)
 k = k+1 =1
 [7] = 1 (k)

2/28/2024 Dr. D. P. Acharjya 14


Continued …
 n = Length [T] = 15
 m = Length[P] = 7
 Compute 

 q=0
 For i = 1 to 15

2/28/2024 Dr. D. P. Acharjya 15


Continued …
 For q = 0 and i = 1
 q >0 and P[1]  T[1] (False)
 P[1] = T[1] (False)
 q  m (7)

2/28/2024 Dr. D. P. Acharjya 16


Continued …
 For q = 0 and i = 2
 q >0 and P[1]  T[2] (False)
 P[1] = T[2] (True)
 q = q +1 = 1
 q (1)  m (7)

2/28/2024 Dr. D. P. Acharjya 17


Continued …
 For q = 1 and i = 3
 q=1>0 and P[2]T[3](True)
 q = [q =1] = 0
 q=0>0 and P[1]T[3](False)
 P[1] = T[3] (False)
 q (0)  m (7)

2/28/2024 Dr. D. P. Acharjya 18


Continued …
 For q = 0 and i = 4
 q>0 and P[1]T[4](False)
 P[1] = T[4] (False)
 q (0)  m (7)

2/28/2024 Dr. D. P. Acharjya 19


Continued …
 For q = 0 and i = 5
 q>0 and P[1]T[5](False)
 P[1] = T[5] (True)
 q = q + 1 =1
 q (1)  m (7)

2/28/2024 Dr. D. P. Acharjya 20


Continued …
 For q = 1 and i = 6
 q>0 and P[2]T[6](False)
 P[2] = T[6] (True)
 q = q + 1 =2
 q (2)  m (7)

2/28/2024 Dr. D. P. Acharjya 21


Continued …
 For q = 2 and i = 7
 q>0 and P[3]T[7](False)
 P[3] = T[7] (True)
 q = q + 1 =3
 q (3)  m (7)

2/28/2024 Dr. D. P. Acharjya 22


Continued …
 For q = 3 and i = 8
 q>0 and P[4]T[8](False)
 P[4] = T[8] (True)
 q = q + 1 =4
 q (4)  m (7)

2/28/2024 Dr. D. P. Acharjya 23


Continued …
 For q = 4 and i = 9
 q>0 and P[5]T[9](False)
 P[5] = T[9] (True)
 q = q + 1=5
 q (5)  m (7)

2/28/2024 Dr. D. P. Acharjya 24


Continued …
 For q = 5 and i = 10
 q>0 and P[6]T[10](True)
 q = [q =5] = 3
 q>0 and P[4]T[10](False)
 P[4] = T[10] (True)
 q = q + 1=4
 q (4)  m (7)

2/28/2024 Dr. D. P. Acharjya 25


Continued …
 For q = 4 and i = 11
 q>0 and P[5]T[11](False)
 P[5] = T[11] (True)
 q = q + 1=5
 q (5)  m (7)

2/28/2024 Dr. D. P. Acharjya 26


Continued …
 For q = 5 and i = 12
 q>0 and P[6]T[12](False)
 P[6] = T[12] (True)
 q = q + 1=6
 q (6)  m (7)

2/28/2024 Dr. D. P. Acharjya 27


Continued …
 For q = 6 and i = 13
 q>0 and P[7]T[13](False)
 P[7] = T[13] (True)
 q = q + 1=7
 q (7) = m (7)
 Pattern with shift 13 – 7 = 6
 q = [q =7] = 1

2/28/2024 Dr. D. P. Acharjya 28


Continued …
 For q = 1 and i = 14
 q>0 and P[2]T[14](True)
 q = [q=1] = 0
 P[2] = T[14] (False)
 q (0) = m (7)

2/28/202 Dr. D. P. Acharjya 29


4
Continued …
 For q = 0 and i = 15
 q>0 and P[1]T[15](False)
 P[1] = T[15] (True)
 q (0) = m (7) (False)
 Process Terminates

2/28/2024 Dr. D. P. Acharjya 30

You might also like