Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 10

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING

Design and Analysis of Algorithms Lab Part B


String Matching Algorithms
(18CS43)
2021-2022

Submitted by

Md Zeaul Haque 1RV20CS086

Under the guidance of

Prof. Ganashree K C
Assistant Professor,
Dept. of CSE,
RV College of Engineering.
STRING MATCHING

ACKNOWLEDGEMENT

We are highly indebted to R V College of Engineering for their guidance


and constant supervision as well as for providing necessary information
regarding the project & also for their support in completing the project.

We would like to express our gratitude towards the designated faculty


Prof. Ganashree K C for their kind cooperation and encouragement
which helped us in completion of this project. We would also like to
express our gratitude towards our HOD Dr. Ramakanth Kumar P and
our principal Dr. K. N. Subramanya for this opportunity.

Lastly, we would also like to thank all the people and sources that
motivated us to take up this topic.

Dept. of CSE, RVCE 2021-22 pg. 2


STRING MATCHING

CONTENTS

1. Problem Statement ……………………………………. pg. 4

2. Naïve String Matching.………………………………… pg. 5

3. Horspool’s String Matching…………………………... pg. 6

4. Rabin Karp String Matching …………………………. pg. 7

5. Knuth Morris Pratt String Matching……………….... pg. 8

6. Comparison……………………………………………............ pg. 9

Dept. of CSE, RVCE 2021-22 pg. 3


STRING MATCHING

1. PROBLEM STATEMENT

String matching is one of the most frequently used operations in all computer programs. Thus, it
is crucial that we determine the best and most optimal way to match strings and find patterns in
texts. This report explores 4 different algorithms that can be applied to find patterns in input text.
Finally, we also compare the complexity of these algorithms.

Dept. of CSE, RVCE 2021-22 pg. 4


STRING MATCHING

2. NAÏVE ALGORITHM

The naïve algorithm matches each character of the pattern to the text one character at a time until
the entire text input is processed. The code for the naïve algorithm is as follows:

Output:

Dept. of CSE, RVCE 2021-22 pg. 5


STRING MATCHING

3. HORSPOOL’S ALGORITHM

Horspool’s String Matching algorithm is an algorithm used to find the pattern(P) within a
text(T). Unlike the Naive string matching algorithm, in Horspool’s, the pattern is matched in a
right to left direction. On a particular mismatch, the entire pattern is shifted to the right by a shift
value which is stored within a shift table. The Horspool’s algorithm checks first the text
character aligned with the last pattern character. If it doesn’t match, shift the pattern forward
until there is a match. We first perform some preprocessing to the pattern before searching it in
the text. The code is as follows :

Output: Complexity :

In the worst-case the


performance of Horspool
algorithm is O(mn), where m
is the length of the substring
and n is the length of the
string.
The average time is O(n). In
the best case, the performance
is sub-linear, and is, in fact,
identical to Boyer-Moore

Dept. of CSE, RVCE 2021-22 pg. 6


STRING MATCHING

4. RABIN KARP ALGORITHM

Rabin-Karp algorithm is an algorithm used for searching/matching patterns in the text using a
hash function. Unlike Naive string matching algorithm, it does not travel through every character
in the initial phase, rather it filters the characters that do not match and then performs the
comparison. The code for the algorithm is as follows:

Output :

Dept. of CSE, RVCE 2021-22 pg. 7


STRING MATCHING

Complexity:
The average and best-case running time of the Rabin-Karp algorithm is O(n+m), but its worst-
case time is O(nm).
5. KNUTH MORRIS PRATT ALGORITHM

The KMP algorithm, which is named after Knuth-Morris-Pratt who invented the algorithm, is a
linear time algorithm for the string matching problem. A time complexity of O(n) has been
achieved by avoiding comparison with an element of S that has previously been involved in
comparison with some element of the pattern P to be matched. The basic idea behind KMP’s
algorithm is: whenever we detect a mismatch (after some matches), we already know some of the
characters in the text of the next window. We take advantage of this information to avoid
matching the characters that we know will anyway match. The code for the algorithm is as
follows:

Dept. of CSE, RVCE 2021-22 pg. 8


STRING MATCHING

Output:

Complexity :
The KMP matching algorithm uses degenerating property (pattern having same sub-patterns
appearing more than once in the pattern) of the pattern and improves the worst case complexity
to O(n). The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some

Dept. of CSE, RVCE 2021-22 pg. 9


STRING MATCHING

matches), we already know some of the characters in the text of the next window. We take
advantage of this information to avoid matching the characters that we know will anyway match.

6. COMPARISON

Dept. of CSE, RVCE 2021-22 pg. 10

You might also like