An Efficient Text Pattern Matching Algorithm For Retrieving Information From Desktop

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

ISSN (Print) : 0974-6846

Indian Journal of Science and Technology, Vol 9(43), DOI: 10.17485/ijst/2016/v9i43/95454, November 2016 ISSN (Online) : 0974-5645

An Efficient Text Pattern Matching Algorithm for


Retrieving Information from Desktop
R. Janani* and S. Vijayarani
Department of Computer Science, School of Computer Science and Engineering, Bharathiar University,
Coimbatore - 641046, Tamil Nadu, India; janani.sengodi@gmail.com, vijimohan_2000@yahoo.com

Abstract
Objectives: To retrieve the information after analyzing the contents of the documents which are stored in the desktop
by applying string matching algorithms. Methods/Statistical Analysis: To analyze the content of the documents, the
various pattern matching algorithms are used to find all the occurrences of a limited set of patterns within an input text
or input document. In order to perform this task, this research work used four existing string matching algorithms; they
are Brute Force algorithm, Knuth-Morris-Pratt algorithm (KMP), Boyer Moore algorithm and Rabin Karp algorithm. This
work also proposes three new string matching algorithms. They are Enhanced Boyer Moore algorithm, Enhanced Rabin
Karp algorithm and Enhanced Knuth-Morris-Pratt algorithm. Findings: For experimentation, this work has used two
types of documents, i.e. .txt and .docx. Performance measures used are search time, number of iterations and accuracy.
From the experimental results, it is realized that the enhanced KMP algorithm gives better accuracy compared to other
string matching algorithms. Application/Improvements: Normally, these algorithms are used in the field of text mining,
document classification, content analysis and plagiarism detection. In future, these algorithms have to be enhanced to
improve their performance and the various types of documents will be used for experimentation.

Keywords: Brute Force, Boyer Moore, Information Retrieval, Knuth-Morris-Pratt, Pattern Matching, Rabin Karp

1. Introduction data retrieval, DNA pattern matching and finding certain


important keywords in security applications3.
Information Retrieval system is used to identify the doc- String matching algorithms has two techniques, they
uments in a document database which match a user’s are, exact matching and approximate matching. In exact
query. In information retrieval system the text can be string matching, the pattern is entirely matched with the
divided into two important units, they are the docu- particular text window of input text and it displays the
ment such as journal paper, book, chapters, sections, web starting or initial index position4. The algorithms which
pages, paragraphs, source code of computer program, belong to this type are Knuth-Morris-Pratt (KMP),
etc., and the term such as word, pair of words, and phrase Needleman Wunsch (NW), Dynamic Programming,
within a particular document1. Desktop search is nothing Boyer Moore and Smith Waterman (SW)5. In approxi-
but it performs the searches over the content of the file or mate string matching, if certain part of the pattern
document. Pattern matching algorithms are also known matched with the selected text window then immediately
as string matching algorithms and these are essential class it displays the output. Examples of this category are Brute
of string algorithms which supports to discover one or Force, Fuzzy string searching and Rabin Karp. Various
all existences of the string within an enormous group of kinds of string matching algorithms are used to dis-
text2. It is an important concept of numerous problems cover the solution for string matching or string searching
and it is used in various applications such as text mining, problems, example, polymorphic string matching, wide

*Author for correspondence


An Efficient Text Pattern Matching Algorithm for Retrieving Information from Desktop

window pattern matching, prefix matching, suffix match- In10 discussed the method for evolving general comput-
ing and longest common subsequence algorithms6. ing applications that are flexible and adjustable for users.
This paper is structured as follows. Section II presents In this outlook, however, the Information Retrieval (IR) is
the methodology of this research work. Result and dis- frequently defined in terms of location and distribution of
cussion is given in Section III and section IV gives the the particular documents to a user to gratify their infor-
conclusion. mation needs. In most of the cases, the morphological
In1 presents the string matching algorithms which per- modifications of words have related to semantic inter-
forms character comparison effectively hence it is used for pretations and it can be measured as equivalent for the
DNA searching, Protein sequence searching and English purpose of IR applications. The algorithm Context-Aware
text searching. Connection is a file system searching tool, Stemming (CAS) is proposed, which is an improved ver-
which syndicates the old-style content-based search and sion of the extensively used Porter’s stemmer. Since only
context information collected from user hustle. By find- generated meaningful stemming words as the stemmer
ing the file system calls, Connection could be identify a output, the results illustrates that the proposed algorithm
sequential relationships between the files and use them considerably reduces the error rate of Porter’s algorithm
to develop and reorder customary content search results. from 76.7% to 6.7% without compromising the efficiency
This tool has enhanced both average recall and average of Porter’s algorithm.
precision over an advanced content-only search system. In11 it has observed that proper classification of text
String searching algorithms plays a major role to detect documents entails information retrieval, machine learn-
patterns in the text. ing and Natural Language Processing (NLP) techniques.
In7 they introduced a new Enhanced Checking and The aim is to focus on important approaches to automatic
Skipping Algorithm (ECSA). The new algorithm enhance text classification based on machine learning techniques
the traditional string searching algorithms by altering the viz. supervised, unsupervised and semi supervised.
character-comparison into character-access, which using They are presented a review of several text classification
the condition type character-access instead of the num- approaches under machine learning paradigm.
ber-comparison, and by initiating the comparison at the In12 the general methodology of Intrusion Detection
latest mismatch in the prior checking, which in turn rises System (IDS) is interpretation to model compatibility, it
the probability of finding the mismatch earlier if there is regulates the destruction happening on the network using
any. This shows that the performance of the enhanced particular models and orders. In order to perform this
algorithm gives better results than other existing algo- task, spontaneous manner of the network are measured
rithms. by modeling and in the next step it utilized as a draft
In8 they presented the algorithm in 1977. At that time, model for specifying unusual manner. In this study, wants
this algorithm considered as the greatest proficient string to determine and select the most efficient procedure for
searching algorithm. In reverse order only this algorithm this performance by investigation, application and also
achieves the character comparison from right to left gathering all kinds of model compatibility technique so
manner and do not require the complete pattern to be that the most proper result is achieved over compatibility
searched in case of a mismatch. It used two shifting rules known attacks with original models.
to shift the pattern right, in case of a match or mismatch
occurs. The time complexity and space complexity of pre-
processing phase is O (m+|Σ|) and the worst case running
2. Methodology
time of searching phase is O (nm + |Σ|).The best case The main goal of this research work is to retrieve the
of Boyer-Moore algorithm is O (n/m). information from desktop by analyzing the contents of
In9 defines the several string matching algorithms. the documents using string matching algorithms. In
From this research work it is observed the space and time order to perform this task, this research work uses four
complexities of those algorithms. They have assessed the existing string matching algorithms; they are Brute Force
performance of the algorithms and verified with biologi- algorithm, Knuth-Morris-Pratt algorithm (KMP), Boyer
cal sequences. The functional and structural relationship Moore algorithm and Rabin Karp algorithm. This work
of the biological sequence is calculated by relationships also proposes three new string matching algorithms. They
on that particular sequence. are Enhanced Boyer Moore algorithm, Enhanced Rabin

2 Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology
R. Janani and S. Vijayarani

Karp algorithm and Enhanced Knuth-Morris-Pratt algo- In this algorithm there is no preprocessing stage and
rithm. The performance factors are used time taken for it needs the constant extra space. The main advantage of
searching the pattern, number of iterations required and this algorithm it is very easy to implement but it is very
its accuracy for single word search, multiple words search slow compared to other algorithms2. The time complexity
and a file search. Figure 1 represents the architecture of of this algorithm is O (mn) and the expected number of
this research work. character comparison is 2n.

3.2 Boyer Moore Horspool Algorithm


The Boyer Moore Horspool algorithm or Horspool algo-
rithm is used to find the substring in the input text or
large document collections. In 1980, this algorithm was
published by Nigel Horspool. It is a simplification of the
Boyer Moore algorithm which is associated with Knuth-
Morris-Pratt algorithm (KMP)6.
In the Boyer-Moore- Horspool algorithm, it com-
pares the text character ti with the last character pm of
the pattern. If they match, then it compares the previ-
ous characters of the text with corresponding characters
in the pattern consecutively right to left, until to detect
either a frequency of the pattern or a mismatch on a text
character. Suppose, irrespective of the match is occurred,
it slides the pattern according to the next occurrence of
Figure 1. Methodology. the character ti in the pattern8. The number of positions to
be moved is determined by the value of skip (ti).
3. Existing Algorithms Computation of the skip Table in the Boyer-Moore
Horspool algorithm has a subtle difference with the origi-
nal skip table definition proposed in the Boyer-Moore
3.1 Brute Force Algorithm algorithm11. In the Boyer- Moore algorithm, the value of
The brute force algorithm is perhaps one of the simplest skip (pm) is always 0. In the Horspool version, skip (pm) =
string matching algorithms. It is also known as naïve m if pm is unique within the pattern; otherwise skip (pm)
algorithm. It achieves the character comparisons among = m-k, where pm-k is the penultimate (rightmost) appear-
the input text and pattern from left to right manner. If the ance of the character pm in the pattern14.
mismatch occurs or a complete match then it shifts one
step to the right13. 3.3 Rabin- Karp Algorithm
Rabin-Karp Algorithm is the simplest string searching
Algorithm 1. Brute Force algorithm. This algorithm was developed by Michael O.
Rabin and Richard M. Karp in 1987. This algorithm uses
the hash function to discover the potential pattern in the
input text. For the length of text n and pattern p of mutual
length m, its average and best case running time is O
(n+m) in space O (p), and also the worst-case time is O
(nm) in space O (m)15.
It is used to discover the hash value of the certain
pattern substring and then it discovers the hash value
of all possible m length substring of the input text. If the
hash value of the pattern and text substring match than

Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology 3
An Efficient Text Pattern Matching Algorithm for Retrieving Information from Desktop

it returns the value otherwise next substring value is 1. The prefix function, Π
matched to calculate the string of length m. The prefix function, Π for a pattern summarizes
the knowledge regarding however the pattern matches
Algorithm 2. Boyer Moore Horspool in contradiction of shifts of itself. This information may
be accustomed avoid unusable shifts of the pattern “p”. In
other words, this succeeds avoiding backtracking on the
string “S”.
2. The KMP Matcher With string “S”, pattern “p” and pre-
fix function “Π” as inputs, the prevalence of “p” in “S”
is found and the algorithm yields the variety of shifts
of “p” after which the existence is found.
3. Running - time analysis: The period of time for com-
puting the prefix function is Θ (m) and period of time
of matching function is Θ (n).

Algorithm 4: Knuth–Morris–Pratt 
Algorithm 3: Rabin-Karp

Algorithm 5: Generate Next Table

3.4 Knuth-Morris-Pratt Algorithm


The Knuth–Morris–Pratt  were developed a linear time
string searching algorithm by analysis of the brute force
algorithm or naïve algorithm. The algorithm was devel-
oped in 1974 by Donald Knuth and Vaughan Pratt, and
independently by James H. Morris and they published it
jointly in 1977.The Knuth-Morris-Pratt algorithm mod-
erates the total number of comparisons of the pattern
against the input string16.
4. Enhanced Algorithms
A matching time of O(n) is accomplished by evading
associations with essentials of ‘S’ that have earlier been 4.1 Enhanced Boyer Moore Horspool
involved in the comparison with some of the specific ele- Algorithm
ment of the pattern ‘p’ to be matched. i.e., backtracking on
the string ‘S’ certainly not occurs17. This algorithm is a proficient string matching algorithm,
Components of KMP algorithm and it has been the standard point of reference for the

4 Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology
R. Janani and S. Vijayarani

string matching problems. This algorithm checks the Algorithm 7: Enhanced Rabin Karp
characters of the pattern from right to left order6. On these
terms of mismatch or a complete match of full pattern, it
uses the two functions to shift the window from left to
the right and the two functions are good suffix shift and
bad character shift. The searching phase of the algorithm
in o (nm) time complexity and the best performance is
O(n/m)18.
First calculate the state transition table S from the pat-
tern P, the pattern may be single line, multiple lines or
a file. Then set the pointer value and state values. If the
pointer value is smaller than the pattern and text value
then read the character from right to left, beginning the
rightmost one. In this case if match occurs, the return the
4.3 Enhanced Knuth-Morris-Pratt
index of the character, otherwise shift the pointer value
again the same process will be done by each level until the
Algorithm
pattern found or not found. Knuth-Morris-Pratt algorithm is one of the efficient
string matching algorithms. This algorithm examines for
Algorithm 6: Enhanced Boyer Moore Horspool existences of a pattern p within a main text t by using the
reflection that while matching, the mismatch occurs, the
word itself represents satisfactory information to regulate
where the next match can begin, thus avoiding the re-
examination of formerly matched characters.
The KMP algorithm uses a bit table to discover the
mismatch of the pattern in an input text. This algorithm
performs the comparison from left to right. It uses the bit
table for the comparison, if match it returns the index of
the text. Otherwise it checks the next bit.

Algorithm 8: Enhanced Knuth-Morris-Pratt


4.2 Enhanced Rabin Karp Algorithm
This searching algorithm that uses the hashing function
to find any one of a set of pattern in input text. Hashing
offers a simple method to avoid a total number of charac-
ter comparisons19. Instead of checking at each position of
the text, it checks only the content of the window whether
the pattern occurs or not.
For length of text N and the pattern P of combined
length M, its best case running time is O (N+M). And
the worst case time is O (NM). First the algorithm used
to find the hash value of the pattern. Then it checks the
input text along with its hash value. If mismatch occurs,
shift the window to the next character then calculate the
hash value and the same process will continue. Otherwise
it returns the index position of the particular character.

Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology 5
An Efficient Text Pattern Matching Algorithm for Retrieving Information from Desktop

5. Result and Discussion Table 2. Performance analysis of Brute Force


Algorithm for text files (a1.txt)
In order to perform this analysis, the performance factors Input Brute Force Algorithm
are search time, number of iterations and relevancy for
various types of inputs. The inputs are single word, mul- Time (ms) Number of Relevancy
tiple words and a file. For this analysis, the existing and Iterations (%)
enhanced string matching algorithms were implemented Single Word 0 2 100
by using Java.
Multiple Words 16 18 100
Search Time: It refers the time taken for searching the
pattern within the input text. It can be estimated by com- File 47 42 90
parison of each character in pattern with the input text.
Iterations: It refers the total number of iterations for
matching the pattern with the input text. It is based on the
given input document and various algorithms.
Relevancy: It refers the accuracy of the algorithm; the
accuracy is calculated by using the formula as follows,

The sample input for this analysis is given in Table 1.


Table 2 illustrates the performance metrics like time, num-
ber of iterations and relevancy of Brute Force Algorithm
for text file (a1.txt). Figure 2. Performance analysis of Brute Force Algorithm.
Table 1. Sample Input
File Name Number of Size (KB) Sample
Words

a1.txt 1679 11.41

a1.docx 2630 27.6

6 Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology
R. Janani and S. Vijayarani

Figure 2 gives the performance analysis of Brute Force Figure 3 gives the performance analysis of Brute Force
Algorithm for text file (a1.txt). Algorithm for docx file (a1.docx).
The Table 4 compares the performance measures like
Table 3. Performance analysis of Brute Force time, number of iterations and relevancy of Boyer Moore
Algorithm for docx files (a1.docx) algorithm and Enhanced Boyer Moore algorithm for text
Input Brute Force Algorithm file (a1.txt). From the analysis the enhanced Boyer Moore
algorithm gives better results than existing algorithm.
Time (ms) Number of Relevancy (%)
Iterations
Single 0 3 100
Word
Multiple 19 20 100
Words
File 53 51 91

The Table 3 describes the performance measures like


time, number of iterations and relevancy of Brute Force
Figure 4. Performance accuracy for Boyer Moore Algorithm.
Algorithm for docx file (a1.docx).

Figure 4 gives the performance accuracy for Boyer


Moore Algorithm for text file (a1.txt), from this graph it
is observed that enhanced Boyer Moore algorithm gives
better results.
The Table 5 compares the performance measures
like time, number of iterations and relevancy of Boyer
Moore algorithm and Enhanced Boyer Moore algorithm
for docx file (a1.docx). From the analysis the enhanced
Boyer Moore algorithm gives better results than existing
Figure 3. Performance accuracy for Brute Force Algorithm. algorithm.

Table 4. Performance analysis of Boyer Moore Algorithm and Enhanced Boyer Moore Algorithm for text files (a1.
txt)
Input Boyer Moore Algorithm Enhanced Boyer Moore Algorithm

Time(ms) Number of Iterations Relevancy (%) Time(ms) Number of Iterations Relevancy (%)
Single Word 0 3 100 0 2 100
Multiple Words 15 16 100 12 15 99
File 36 38 92 32 25 95

Table 5. Performance analysis of Boyer Moore Algorithm and Enhanced Boyer Moore Algorithm for docx files (a1.
docx)
Input Boyer Moore Algorithm Enhanced Boyer Moore Algorithm

Time (ms) Number of Iterations Relevancy (%) Time (ms) Number of Iterations Relevancy (%)
Single Word 0 4 99 0 3 100
Multiple Words 18 20 97 17 15 99
File 50 38 90 45 35 99

Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology 7
An Efficient Text Pattern Matching Algorithm for Retrieving Information from Desktop

Figure 6 describes the performance accuracy for Rabin


Karp Algorithm for text file (a1.txt), from this figure it is
observed that the Enhanced Rabin Karp algorithm per-
forms well when compared to existing algorithm.
Table 7 shows that the performance measures of Rabin
Karp algorithm and Enhanced Rabin Karp algorithm for
docx files (a1.docx). From the experimental results, the
Enhanced Rabin Karp algorithm performs well when
compared to existing algorithm.

Figure 5. Performance accuracy for Boyer Moore Algorithm.

Figure 5 gives the performance accuracy for Boyer


Moore Algorithm for docx file (a1.docx), from this graph
it is observed that enhanced Boyer Moore algorithm gives
better results.
Table 6 shows that the performance measures of
Rabin Karp algorithm and Enhanced Rabin Karp algo-
rithm for text files (a1.txt). From the experimental results, Figure 7. Performance accuracy for Rabin Karp Algorithm.
the Enhanced Rabin Karp algorithm performs well when
compared to existing algorithm. Figure 7 describes the performance accuracy for Rabin
Karp Algorithm for docx file (a1.docx), from this figure it
is observed that the Enhanced Rabin Karp algorithm per-
forms well when compared to existing algorithm.
Table 8 shows that the performance measures of
Knuth-Morris-Pratt Algorithm and Enhanced Knuth-
Morris-Pratt Algorithm for text files (a1.txt). From the
experimental results, the Enhanced Knuth-Morris-Pratt
algorithm performs well when compared to existing algo-
rithm.
Figure 8 shows the performance accuracy for Knuth-
Figure 6. Performance accuracy for Rabin Karp Algorithm. Morris-Pratt Algorithm for text file (a1.txt). From this

Table 6. Performance analysis of Rabin Karp algorithm and Enhanced Rabin Karp Algorithm for text files (a1.txt)
Input Rabin Karp Algorithm Enhanced Rabin Karp Algorithm
Time (ms) Number of Iterations Relevancy (%) Time (ms) Number of Iterations Relevancy (%)
Single Word 0 2 100 0 1 100
Multiple Words 22 16 100 20 12 100
File 45 24 95 30 23 96

Table 7. Performance analysis of Rabin Karp algorithm and Enhanced Rabin Karp Algorithm for docx files (a1.docx)
Input Rabin Karp Algorithm Enhanced Rabin Karp Algorithm
Time (ms) Number of Iterations Relevancy (%) Time (ms) Number of Iterations Relevancy (%)
Single Word 0 3 100 0 2 100
Multiple Words 31 19 100 26 17 100
File 46 20 90 31 18 97

8 Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology
R. Janani and S. Vijayarani

Table 8. Performance analysis of Knuth-Morris-Pratt Algorithm and Enhanced Knuth-Morris-Pratt Algorithm for
text files (a1.txt)

Knuth-Morris-Pratt Algorithm Enhanced Knuth-Morris-Pratt Algorithm


Input
Time (ms) Number of Iterations Relevancy (%) Time (ms) Number of Iterations Relevancy (%)

Single Word 0 2 100 0 1 100

Multiple Words 15 9 100 11 7 100

File 35 16 100 21 11 100

Table 9. Performance analysis of Knuth-Morris-Pratt Algorithm and Enhanced Knuth-Morris-Pratt Algorithm for
docx files (a1.docx)
Input Knuth-Morris-Pratt Algorithm Enhanced Knuth-Morris-Pratt Algorithm

Time (ms) Number of Iterations Relevancy (%) Time (ms) Number of Iterations Relevancy (%)
Single Word 0 3 100 0 2 100
Multiple Words 18 11 100 15 9 100
File 39 18 100 30 12 100

analysis, the Enhanced Knuth-Morris-Pratt algorithm Table 9 shows that the performance measures of
performs well when compared to existing algorithm. Knuth-Morris-Pratt Algorithm and Enhanced Knuth-
Morris-Pratt Algorithm for docx files (a1.docx). From the
experimental results, the Enhanced Knuth-Morris-Pratt
algorithm performs well when compared to existing algo-
rithm.

Figure 9 shows the performance accuracy for Knuth-


Morris-Pratt Algorithm for docx file (a1.docx). From this
analysis, the Enhanced Knuth-Morris-Pratt algorithm
performs well when compared to existing algorithm.
Figure 8. Performance accuracy for Knuth-Morris-Pratt
From the experimental results, it is observed that the
Algorithm. enhanced Knuth-Morris-Pratt algorithm performs well
when compared to all other existing algorithms. Figure 10
shows that the sample output of enhanced Knuth-Morris-
Pratt algorithm for text file (a1.txt).

Figure 9. Performance accuracy for Knuth-Morris-Pratt Figure 10. Sample output of Knuth-Morris-Pratt Algorithm
Algorithm. for text file (a1.txt).

Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology 9
An Efficient Text Pattern Matching Algorithm for Retrieving Information from Desktop

Figure 11 shows the sample output of enhanced 2. Al-Mazroi A, Rashid NA. A Fast Hybrid Algorithm for
Knuth-Morris-Pratt algorithm for docx file (a1.docx). the Exact String Matching Problem. American Journal of
Engineering and Applied Sciences. 2011; 4(1):102–07.
3. Shweta C, Dharmadhikari D, Ingle M, Kulkarni P. Empirical
Studies on Machine Learning Based Text Classification
Algorithms. Advanced Computing. An International
Journal (ACIJ). 2011; 2(6):161–69.
4. Bist AS. Pattern matching algorithms for computer virus
detection. International Journal of Engineering Sciences
and Research Technology. 2013; 2(1):28–9.
Figure 11. Sample output of Knuth-Morris-Pratt Algorithm 5. Naser MAS, Rashid NA, FaizAboalmaaly M. Quick-Skip
for docx file (a1.docx). search hybrid algorithm for  the  exact  string matching
problem. International Journal of Computer Theory and
Engineering. 2012; 4(2):1–7.
6. Conclusion 6. Jony  AI.  Analysis  of  Multiple  String  Pat
Information retrieval (IR) is used to identify the rel- tern  Matching Algorithm. International
Journal of Advanced Computer Science and
evant documents in a large document collection which
Information Technology (IJACSIT). 2014; 3(4):344–53.
is matching a user’s query. The main goal of information
7. Moh’dMhashi M, Alwakeel M. New Enhanced Exact
retrieval System is to discover the significant information
String, Searching Algorithm. IJCSNS International Journal
that satisfies user information needs. Desktop search is of Computer Science and Network Security. 2010; 10(1):1–
where the information sources are the files stored on a 10.
personal computer, including email and web pages based 8. Boyer RS, Moore JS. A fast string searching algorithm.
on content analysis. To analyze the content the various Communication of the ACM. 1977; 20(10):762–72.
pattern matching algorithms are used and it is used to 9. Pandiselvam P, Marimuthu T, Lawrance R. A Comparative
find all the existences of a limited set of patterns inside an Study on String Matching Algorithms of Biological
input text or input document. String matching algorithm Sequences, Springer Berlin Heidelberg. 2009; 510–17.
is used to matches the pattern exactly or approximately 10. Charras C, Lecroq T, Daniel J. A Very fast string search-
within the input document. String matching algorithms ing algorithm for small alphabets and long patterns,
plays vital role in the field of information retrieval. To Combinational Pattern Matching, 9th Annual Symposium,
CPM 98 Piscataway, New Jersey, USA. 2005; 1448:54–8.
retrieve the information from desktop these algorithms
11. Robert S, Boyer B, Moore JS. A fast string Searching
are used widely. It is used to find one or all occurrences of
Algorithm. Communication of the ACM. 1997; 20(10):762–
a pattern in large document collection. 72.
This research work analyzes the performance measures 12. Hossein G, Shokoufeh S, Abozar S. A Survey of Pattern
of existing and enhanced string matching algorithms. The Matching Algorithm in Intrusion Detection System Tehran,
performance factors are time, number of iteration and its Iran. Indian Journal of Science and Technology. 2016 Jun;
accuracy for single line, multiple lines and a file. From 9(21):1–7.
the analysis, in existing the KMP algorithm gives the bet- 13. Rahul M, Diwate B, Satish J, Alaspurkar A. Study
ter accuracy for all the inputs. In enhanced algorithms, of Different Algorithms for Pattern Matching.
the enhanced KMP algorithm gives the better accuracy. International Journal of Advanced Research in
Form the existing and enhanced KMP algorithms; the Computer Science and Software Engineering. 2013;
enhanced KMP algorithm gives the better accuracy. 3(3):1–8.
14. Bhandari J, Kumar A. String Matching Rules Used By
Variants of Boyer-Moore Algorithm. Journal of Global
7. References Research in Computer Science. 2014; 5(1).
15. Shivaji SK, Prabhudeva S. Plagiarism Detection by using
1. Verma A, Kaur I, Singh I. Comparative analysis of data min- Karp-Rabin and String Matching Algorithm Together.
ing tools and techniques for information retrieval. Indian International Journal of Computer Applications. 2015;
Journal of Science and Technology. 2016 Mar; 9(11):1–16. 116(23):1–5.

10 Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology
R. Janani and S. Vijayarani

16. Wahlstrom S. Evaluation of String Searching Algorithms, Science and Information Technology (IJACSIT). 2014;
Italy. 2004; 1–22 3(4):344–53.
17. Gope AP, Behera RN. A Novel Pattern Matching Algorithm 19. Harini R, Chandrasekar C. Efficient Sequential Pattern
in Genome Sequence Analysis. (IJCSIT) International Matching Algorithm for Classified Brain Image. Indian
Journal of Computer Science and Information Technologies. Journal of Science and Technology. 2015 Jul; 8(14):1–10.
2014; 5(4):5450–57.
18. Jony AI. Analysis of Multiple String Pattern Matching
Algorithms. International Journal of Advanced Computer

Vol 9 (43) | November 2016 | www.indjst.org Indian Journal of Science and Technology 11

You might also like