Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 11

The Rabin-Karp Algorithm

String Matching

Jonathan M. Elchison
19 November 2004
CS-3410 Algorithms
Dr. Shomper

Background
String matching
Nave method
n size of input string
m size of pattern to be matched
O( (n-m+1)m )
( n2 ) if m = floor( n/2 )

We can do better

How it works
Consider a hashing scheme
Each symbol in alphabet can be represented by
an ordinal value { 0, 1, 2, ..., d }
|| = d
Radix-d digits

How it works
Hash pattern P into a numeric value
Let a string be represented by the sum of these
digits
Horners rule ( 30.1)

Example
{ A, B, C, ..., Z } { 0, 1, 2, ..., 26 }
BAN
1 + 0 + 13
= 14
CARD
2 + 0 + 17 + 3 = 22

Upper limits
Problem
For long patterns, or for large alphabets, the number
representing a given string may be too large to be practical

Solution
Use MOD operation
When MOD q, values will be < q

Example
BAN

= 1 + 0 + 13

= 14

14 mod 13 = 1
BAN 1

CARD = 2 + 0 + 17 + 3 = 22
22 mod 13 = 9
CARD 9

Searching

Spurious Hits
Question
Does a hash value match mean that the patterns match?

Answer
No these are called spurious hits

Possible cases
MOD operation interfered with uniqueness of hash values
14 mod 13 = 1
27 mod 13 = 1
MOD value q is usually chosen as a prime such that 10q just fits
within 1 computer word

Information is lost in generalization (addition)


BAN 1 + 0 + 13 = 14
CAM 2 + 0 + 12 = 14

Code
RABIN-KARP-MATCHER( T, P, d, q )
n length[ T ]
m length[ P ]
h dm-1 mod q
p0
t0 0
for i 1 to m
Preprocessing
do p ( d*p + P[ i ] ) mod q
t0 ( d*t0 + T[ i ] ) mod q
for s 0 to n m
Matching
do if p = ts
then if P[ 1..m ] = T[ s+1 .. s+m ]
then print Pattern occurs with shift s
if s < n m
then ts+1 ( d * ( ts T[ s + 1 ] * h ) + T[ s + m + 1 ] ) mod q

Performance
Preprocessing (determining each pattern hash)
( m )

Worst case running time


( (n-m+1)m )
No better than nave method

Expected case
If we assume the number of hits is constant
compared to n, we expect O( n )
Only pattern-match hits not all shifts

Demonstration
http://www-igm.univ-mlv.fr/~lecroq/string/node5
.html

Sources:
Cormen, Thomas S., et al. Introduction to Algorithms. 2nd ed. Boston: MIT Press, 2001.
Karp-Rabin algorithm. 15 Jan 1997. <http://www-igm.univ-mlv.fr/~lecroq/string/node5.html>.
Shomper, Keith. Rabin-Karp Animation. E-mail to Jonathan Elchison. 12 Nov 2004.

The Rabin-Karp Algorithm


String Matching

Jonathan M. Elchison
19 November 2004
CS-3410 Algorithms
Dr. Shomper

You might also like