Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Design and Analysis of Algorithms

String Matching Problem


(Rabin - Karp Algorithm)

Dr. D. P. Acharjya
Professor, SCOPE
SJT Annex – 201 E
Dr. D. P. Acharjya 2/26/2024 11
String Matching Problem
 Finding all occurrences of a pattern (String) in a text
leads to string matching problem.
 It arises frequently in text editing programs where text
is a document, and the string searched for is a
particular word.
 String-matching algorithms are also used to search for
particular patterns in DNA sequences.
 For Example:

Dr. D. P. Acharjya 2/26/2024 22


Mathematical Formulation
 Assume that the text is an array T[1  n] of length n.
The pattern is an array P[1  m] of length m. It is to be
noted that m ≤ n.

 It is to be noted that the elements of P and T are


characters drawn from a finite alphabet Σ. For example,
Σ = {0, 1} or Σ = {a, b, , z}.

 The character arrays P and T are often called strings of


characters.
Dr. D. P. Acharjya 2/26/2024 33
Continued…
 The pattern P occurs with shift s in text T (or pattern P
occurs beginning at position s + 1 in text T) if
0 ≤ s ≤ n - m and T[s+1  s+m] = P[1  m]
 It means that T[s + j] = P[j], for 1 ≤ j ≤ m.
 If P occurs with shift s in T, then it is a valid shift
otherwise, s is an invalid shift.
 The string-matching problem is the problem of finding
all valid shifts with which a given pattern P occurs in
a given text T.

Dr. D. P. Acharjya 2/26/2024 44


Rabin-Karp Algorithm
1. RABIN-KARP-MATCHER(T, P, d, q)
2. n ← Length[T]
3. m ← Length[P]
4. h ← d(m-1) mod q
5. p←0
6. t0 ← 0
7. for i ← 1 to m (Preprocessing)
8. do p ← (dp + P[i]) mod q
9. t0 ← (dt0 + T[i]) mod q

Dr. D. P. Acharjya 2/26/2024 55


Continued …
10. for s ← 0 to (n – m) (Matching)
11. do if p = ts
12. then if P[1  m] = T[s + 1  s + m]
13. then print “Pattern occurs with shift s”
14.if s < (n-m)
15. then ts+1 ← (d(ts-T[s+1]h)+T[s+m+1]) mod q
 The preprocessing step executes for (m)
 The execution time of matching is O(n – m + 1)
 Each value of s has m computations as P[j] = T[s + j];
1 j  m
 Computing Time is O((n – m + 1)m)
Dr. D. P. Acharjya 2/26/2024 66
Numerical Illustration
 Consider a text T =
2359023141526739921, find all
the valid shifts for the pattern P
= 31415.
 Here  = {0,1,2, …, 9}
 Let q = 13 (Prime)
 d = |  |=10
 Here n = Length[T] = 19
 m = Length[P] = 5
 h = dm-1 mod q = 104 mod 13 =3
Dr. D. P. Acharjya 2/26/2024 7
Continued … (Preprocessing)
 p = 0 and t0 = 0
 For i =1
p = (dp + P[i =1]) % 13
= (100 + 3) % 13 = 3
t0 = (dt0 + T[i =1]) % 13
= (100 + 2) % 13 = 2
 For i =2
p = (dp + P[i =2]) % 13 = (103 + 1) % 13 = 5
t0 = (dt0 + T[i =1]) % 13 = (102 + 3) % 13 = 10

Dr. D. P. Acharjya 2/26/2024 8


Continued …
 For i = 3
p = (dp + P[i =3]) % 13
= (105 + 4) % 13 = 2
t0 = (dt0 + T[i =3]) % 13
= (1010 + 5) % 13 = 1
 For i = 4
p = (dp + P[i =4]) % 13
= (102 + 1) % 13 = 8
t0 = (dt0 + T[i =4]) % 13
= (101 + 9) % 13 = 6
Dr. D. P. Acharjya 2/26/2024 9
Continued …
 For i = 5
p = (dp + P[i =5]) % 13
= (108 + 5) % 13 = 7
t0 = (dt0 + T[i =5]) % 13
= (106 + 0) % 13 = 8

Dr. D. P. Acharjya 2/26/2024 10


Continued … (Matching)
 For s = 0
p (7) = t0 (8) (False)
s = 0 < (n-m) = 14

t1 = (d(t0 – T[s+1]h) + T[s+m+1]) % q


= (10(8 – T[1]3) + T[0+5+1]) % 13
= (10(8 – 23) + 2) % 13 = 9

Dr. D. P. Acharjya 2/26/2024 11


Continued …
 For s = 1
p (7) = t1 (9) (False)
s = 1 < (n-m) = 14

t2 = (d(t1 – T[2]h) + T[1+5+1]) % 13


= (10(9 – 33) + 3) % 13 = 3

Dr. D. P. Acharjya 2/26/2024 12


Continued …
 For s = 2
p (7) = t2 (3) (False)
s = 2 < (n-m) = 14

t3 = (d(t2 – T[3]h) + T[2+5+1]) % 13


= (10(3 – 53) + 1) % 13 = 11

Dr. D. P. Acharjya 2/26/2024 13


Continued …
 For s = 3
p (7) = t3 (11) (False)
s = 3 < (n-m) = 14

t4 = (d(t3 – T[4]h) + T[3+5+1]) % 13


= (10(11 – 93) + 4) % 13 = 0

Dr. D. P. Acharjya 2/26/2024 14


Continued …
 For s = 4
p (7) = t4 (0) (False)
s = 4 < (n-m) = 14

t5 = (d(t4 – T[5]h) + T[4+5+1]) % 13


= (10(0 – 03) + 1) % 13 = 1

Dr. D. P. Acharjya 2/26/2024 15


Continued …
 For s = 5
p (7) = t5 (1) (False)
s = 5 < (n-m) = 14

t6 = (d(t5 – T[6]h) + T[5+5+1]) % 13


= (10(1 – 23) + 5) % 13 = 7

Dr. D. P. Acharjya 2/26/2024 16


Continued …
 For s = 6
p (7) = t6 (7) (True)
Pattern exists at shift s=6
s = 6 < (n-m) = 14
t7 = (d(t6 – T[7]h) + T[6+5+1]) % 13
= (10(7 – 33) + 2) % 13 = 8

Dr. D. P. Acharjya 2/26/2024 17


Continued …
 For s = 7
p (7) = t7 (8) (False)

s = 7 < (n-m) = 14
t8 = (d(t7 – T[8]h) + T[7+5+1]) % 13
= (10(8 – 13) + 6) % 13 = 4

Dr. D. P. Acharjya 2/26/2024 18


Continued …
 For s = 8
p (7) = t8 (4) (False)

s = 8 < (n-m) = 14
t9 = (d(t8 – T[9]h) + T[8+5+1]) % 13
= (10(4 – 43) + 7) % 13 = 5

Dr. D. P. Acharjya 2/26/2024 19


Continued …
 For s = 9
p (7) = t9 (5) (False)

s = 9 < (n-m) = 14
t10 =(d(t9 – T[10]h) + T[9+5+1]) % 13
= (10(5 – 13) + 3) % 13 = 10

Dr. D. P. Acharjya 2/26/2024 20


Continued …
 For s = 10
p (7) = t10 (10) (False)

s = 10 < (n-m) = 14
t11 =(d(t10 – T[11]h) + T[10+5+1]) % 13
= (10(10 – 53) + 9) % 13 = 11

Dr. D. P. Acharjya 2/26/2024 21


Continued …
 For s = 11
p (7) = t11 (11) (False)

s = 11 < (n-m) = 14
t12 =(d(t11 – T[12]h) + T[11+5+1]) % 13
= (10(11 – 23) + 9) % 13 = 7

Dr. D. P. Acharjya 2/26/2024 22


Continued …
 For s = 12
p (7) = t12 (7) (True)
P[j]  T[s + j]
s = 12 < (n-m) = 14
t13 =(d(t12 – T[13]h) + T[12+5+1]) % 13
= (10(7 – 63) + 2) % 13 = 9

Dr. D. P. Acharjya 2/26/2024 23


Continued …
 For s = 13
p (7) = t13 (9) (False)

s = 13 < (n-m) = 14
t14 =(d(t13 – T[14]h) + T[13+5+1]) % 13
= (10(9 – 73) + 1) % 13 = 11

Dr. D. P. Acharjya 2/26/2024 24


Continued …
 For s = 14
p (7) = t14 (11) (False)

s = 14 < (n-m) =14 (False)


Process Terminates.

Dr. D. P. Acharjya 2/26/2024 25

You might also like