Professional Documents
Culture Documents
PartB Report
PartB Report
ENGINEERING
Submitted by
Prof. Ganashree K C
Assistant Professor,
Dept. of CSE,
RV College of Engineering.
STRING MATCHING
ACKNOWLEDGEMENT
Lastly, we would also like to thank all the people and sources that
motivated us to take up this topic.
CONTENTS
6. Comparison……………………………………………............ pg. 9
1. PROBLEM STATEMENT
String matching is one of the most frequently used operations in all computer programs. Thus, it
is crucial that we determine the best and most optimal way to match strings and find patterns in
texts. This report explores 4 different algorithms that can be applied to find patterns in input text.
Finally, we also compare the complexity of these algorithms.
2. NAÏVE ALGORITHM
The naïve algorithm matches each character of the pattern to the text one character at a time until
the entire text input is processed. The code for the naïve algorithm is as follows:
Output:
3. HORSPOOL’S ALGORITHM
Horspool’s String Matching algorithm is an algorithm used to find the pattern(P) within a
text(T). Unlike the Naive string matching algorithm, in Horspool’s, the pattern is matched in a
right to left direction. On a particular mismatch, the entire pattern is shifted to the right by a shift
value which is stored within a shift table. The Horspool’s algorithm checks first the text
character aligned with the last pattern character. If it doesn’t match, shift the pattern forward
until there is a match. We first perform some preprocessing to the pattern before searching it in
the text. The code is as follows :
Output: Complexity :
Rabin-Karp algorithm is an algorithm used for searching/matching patterns in the text using a
hash function. Unlike Naive string matching algorithm, it does not travel through every character
in the initial phase, rather it filters the characters that do not match and then performs the
comparison. The code for the algorithm is as follows:
Output :
Complexity:
The average and best-case running time of the Rabin-Karp algorithm is O(n+m), but its worst-
case time is O(nm).
5. KNUTH MORRIS PRATT ALGORITHM
The KMP algorithm, which is named after Knuth-Morris-Pratt who invented the algorithm, is a
linear time algorithm for the string matching problem. A time complexity of O(n) has been
achieved by avoiding comparison with an element of S that has previously been involved in
comparison with some element of the pattern P to be matched. The basic idea behind KMP’s
algorithm is: whenever we detect a mismatch (after some matches), we already know some of the
characters in the text of the next window. We take advantage of this information to avoid
matching the characters that we know will anyway match. The code for the algorithm is as
follows:
Output:
Complexity :
The KMP matching algorithm uses degenerating property (pattern having same sub-patterns
appearing more than once in the pattern) of the pattern and improves the worst case complexity
to O(n). The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some
matches), we already know some of the characters in the text of the next window. We take
advantage of this information to avoid matching the characters that we know will anyway match.
6. COMPARISON