Professional Documents
Culture Documents
UNIT-5 DAA Complete Notes
UNIT-5 DAA Complete Notes
Branch - IT
Design and Analysis of Algorithms
UNIT-5
String Matching Algorithms
Basic Definition of String Matching
String Matching Algorithm is also called "String Searching Algorithm."
Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least m characters at
the end) times and in iteration we are doing m comparisons.
So the total complexity is O (n-m+1)m.
Example on Naïve String Matching Algorithm:
Let T = 1011101110 & P = 111 . Find valid Shift using Naïve String Matching
Algorithm.
Example on Naïve String Matching Algorithm:
Example on Naïve String Matching Algorithm:
Rabin Karp String Matching Algorithm:
Rabin-Karp-Algorithm: The Rabin-Karp string matching algorithm calculates a hash
value for the pattern, as well as for each M-character subsequences of text to be
compared. If the hash values are unequal, the algorithm will determine the hash
value for next M-character sequence.
If the hash values are equal, the algorithm will analyze the pattern and the M-
character sequence. In this way, there is only one comparison per text
subsequence, and character matching is only required when the hash values
match. We can compute the numerical (hash) values using Horner’s rule.
Rabin Karp String Matching Algorithm:
RABIN-KARP-MATCHER (T, P, d, q)
1. n ← length [T]
2. m ← length [P]
3. h ← dm-1 mod q
4. p ← 0
5. t0 ← 0
6. for i ← 1 to m
7. do p ← (dp + P[i]) mod q
8. t0 ← (dt0+T [i]) mod q
9. for s ← 0 to n-m
10. do if p = ts
11. then if P [1.....m] = T [s+1.....s + m]
12. then "Pattern occurs with shift" s
13. If s < n-m
14. then ts+1 ← (d (ts-T [s+1]h)+T [s+m+1])mod q
Example on Rabin Karp String Matching Algorithm:
This is the formula used to shift the character.
This can be done using Horner’s Rule:
ts+1 ← (d (ts-T [s+1]h)+T [s+m+1])mod q
Example: For string matching, working module q = 11, how many spurious hits
does the Rabin-Karp matcher encounters in Text T = 3141592653589793 when
looking for the pattern P=26.
Solution: Given
Text T = 3141592653589793
Pattern P=26.
Working Modulo q = 11
We calculate the: P mod q = 26 mod 11 = 4
Now find the exact match of P mod q in the given Text.
Example on Rabin Karp String Matching Algorithm:
Example on Rabin Karp String Matching Algorithm:
Example on Rabin Karp String Matching Algorithm:
Example on Rabin Karp String Matching Algorithm:
Example on Rabin Karp String Matching Algorithm:
We keep on traversing the Text and trying to match the P mod q with the T mod q.
There will be two cases.
Case 1 : If P mod q = T mod q, then there is an exact match.
Case 2 : If P mod q is not equal to T mod q, then there is a spurious hit. Hence, we
search for the next match.
Will move to the next characters of the Text with the help of the Horner’s Rule
Complexity: The running time of RABIN-KARP-MATCHER in the worst case scenario
O ((n-m+1) m but it has a good average case running time.
If the expected number of strong shifts is small O (1) and prime q is chosen to be
quite large, then the Rabin-Karp algorithm can be expected to run in time O (n+m)
plus the time to require to process spurious hits.
Example on Rabin Karp String Matching Algorithm:
Example 2: Suppose Text T = 2359023141526739921 , Pattern P= 31415 and
Working modulo q=13. Find the valid match using Rabin Karp Algorithm.
String Matching: Knuth-Morris-Pratt Algorithm
Knuth-Morris-Pratt Algorithm: Knuth-Morris and Pratt introduce a linear time
algorithm for the string matching problem.
A matching time of O (n) is achieved by avoiding comparison with an element of 'S'
that have previously been involved in comparison with some element of the
pattern 'p' to be matched. i.e., backtracking on the string 'S' never occurs.
Components of KMP Algorithm:
1. The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates
knowledge about how the pattern matches against the shift of itself.
This information can be used to avoid a useless shift of the pattern 'p.' In other
words, this enables avoiding backtracking of the string 'S.'
String Matching: Knuth-Morris-Pratt Algorithm
2. The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs,
find the occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which
occurrences are found.
Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to
right.
For Example:
Input:
Input: Main String: “AAAABAAAAABBBAAAAB”, The pattern “AAAB”
Output:
Pattern found at location: 1
Pattern found at location: 7
Pattern found at location: 14
String Matching: Knuth-Morris-Pratt Algorithm
Algorithm for Computing the Prefix Function (Π):
COMPUTE- PREFIX- FUNCTION (P)
1.m ←length [P] //'p' pattern to be matched
2. Π [1] ← 0
3. k ← 0
4. For q ← 2 to m
5. do while k > 0 and P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
String Matching: Knuth-Morris-Pratt Algorithm
Running Time Analysis for Computing the Prefix Function:
In the above pseudo code for calculating the prefix function, the for loop from step 4 to step
10 runs 'm' times. Step1 to Step3 take constant time. Hence the running time of computing
prefix function is O (m). Example: Compute Π for the pattern 'p' below:
String Matching: Knuth-Morris-Pratt Algorithm
String Matching: Knuth-Morris-Pratt Algorithm
String Matching: Knuth-Morris-Pratt Algorithm
KMP Matcher: KMP Matcher with the pattern 'p,' the string 'S' and prefix function 'Π' as
input, finds a match of p in S.
KMP-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0 // numbers of characters matched
5. for i ← 1 to n // scan S from left to right
6. do while q > 0 and P [q + 1] ≠ T [i]
7. do q ← Π [q] // next character does not match
8. If P [q + 1] = T [i]
9. then q ← q + 1 // next character matches
10. If q = m // is all of p matched?
11. then print "Pattern occurs with shift" i - m
12. q ← Π [q] // look for the next match
String Matching: Knuth-Morris-Pratt Algorithm
Running Time Analysis: The for loop beginning in step 5 runs 'n' times, i.e., as long as the
length of the string 'S.' Since step 1 to step 4 take constant times, the running time is
dominated by this for the loop. Thus running time of the matching function is O (n).
String Matching: Knuth-Morris-Pratt Algorithm
String Matching: Knuth-Morris-Pratt Algorithm
String Matching: Knuth-Morris-Pratt Algorithm
String Matching: Knuth-Morris-Pratt Algorithm
String Matching: Knuth-Morris-Pratt Algorithm
If m is the length of the pattern and n is the length of the text to be searched then,
1. i ← RANDOM(p, r)
2. exchange A[r] ↔ A[i]
3. return PARTITION(A, p, r)
RANDOMIZED-QUICKSORT(A, p, r):
1. If p < r
2. then q ← RANDOMIZED-PARTITION(A, p, r)
3. RANDOMIZED-QUICKSORT(A, p, q - 1)
4. RANDOMIZED-QUICKSORT(A, q + 1, r)
Randomized version of Quick Sort:
Randomized version of Quick Sort:
Analysis of Randomized Quicksort: We are discussing here worst and best case
complexity of randomized quicksort algorithm
Worst case complexity: Let T (n) be the worst-case time for the procedure
QUICKSORT on an input of size n. We have the recurrence
VC = {b, c, d, f, g}
Set Cover Problem : Approximate Algorithm
Given a universe U of n elements, a collection of subsets of U say S = {S 1, S2…,Sm}
where every subset Si has an associated cost. Find a minimum cost subcollection of S
that covers all elements of U.
Example:
U = {1,2,3,4,5}
S = {S1,S2,S3}
S1 = {4,1,3}, Cost(S1) = 5
S2 = {2,5}, Cost(S2) = 10
S3 = {1,4,3,2}, Cost(S3) = 3
Output: Minimum cost of set cover is 13 and set cover is {S2, S3} There
are two possible set covers {S1, S2} with cost 15 and {S2, S3} with cost
13.
Set Cover Problem : Approximate Algorithm
Set Cover is NP-Hard: There is no polynomial time solution available for this problem as the problem is a known
NP-Hard problem. There is a polynomial time Greedy approximate algorithm, the greedy algorithm provides a Logn
approximate algorithm.
2-Approximate Greedy Algorithm: Let U be the universe of elements, {S1, S2, … Sm} be collection of subsets of U
and Cost(S1), C(S2), … Cost(Sm) be costs of subsets.
U = {1,2,3,4,5}, S = {S1,S2,S3}
S1 = {4,1,3}, Cost(S1) = 5
S2 = {2,5}, Cost(S2) = 10
S3 = {1,4,3,2}, Cost(S3) = 3
Example: Let us consider the above example to understand Greedy Algorithm.
First Iteration: I = {}
The per new element cost for S1 = Cost(S1)/|S1 – I| = 5/3
The per new element cost for S2 = Cost(S2)/|S2 – I| = 10/2
The per new element cost for S3 = Cost(S3)/|S3 – I| = 3/4
Since S3 has minimum value S3 is added, I becomes {1,4,3,2}.
Second Iteration: I = {1,4,3,2} The per new element cost for S1 = Cost(S1)/|S1 – I| =
5/0 Note that S1 doesn’t add any new element to I. The per new element cost for
S2 = Cost(S2)/|S2 – I| = 10/1 Note that S2 adds only 5 to I.
The greedy algorithm provides the optimal solution for above example, but it may
not provide optimal solution all the time.
Fast Fourier Transform(FFTs)
Complex roots of unity: A complex nth root of unity is a complex number ω such that ωn=1.
There are exactly n complex nth roots of unity: e2Πik/n for k= 0, 1,………, n- 1.
To interpret this formula, we use the definition of the exponential of a complex number: eiu= cos
(u)+i sin(u)
The value ωn == e2Πi/n is the principal nth root of unity, all other complex nth roots of unity are
powers of ωn. The n complex nth roots of unity.
Recursive FFTs: By using a method known as the fast Fourier transform (FFT), which takes
advantage of the special properties of the complex roots of unity, we can compute discrete Fourier
transform DFTn (a) in time ‚ϴ(n lg n) as opposed to the ϴ(n2) time of the straightforward method.
We assume throughout that n is an exact power of 2.
The FFT method employs a divide-and-conquer strategy, using the even-indexed and odd indexed
coefficients of A.x/ separately to define the two new polynomials A [0](x) and A [1](x) of
degree-bound n/2:
Fast Fourier Transform(FFTs)