Chapter 3
Review of Literature
There are many well-known string matching algorithms for one dimensional text like
Knuth Morris Pratt and Boyer Moore. ‘There is also a string matching algorithm,
that constructs a finite automaton for the pattern and finds all the occurrences of the
pattern in the text. There are many variations to this string matching problem. One
such variation is to find the occurrences of the pattern P in the text 7’ allowing mis-
matches. This problem is also called as approximate string matching. ‘This problem
can be defined as follows: Given a text T of length n, a pattern P of length m and
an integer &, to find all occurrences of the pattem P in the text T with at most &
mismatches. Landau. GM. and Vishkin. U. (1986) have given an algorithm which
runs in O(k (mlogm+n)) time.
Landau. GM, and Vishkin. U, (1988 ) have given an algorithm which runs
in O(m-+nk?) time for an alphabet set whose size is fixed. For general input, the
algorithm runs in O(rmlogm + nk) time. In both the cases, the space requirement
is O(m)
Several algorithms have been presented for approximate string matching that use
O(kn}Ofkn) comparisons in the worst case or in the average case by taking advan-
tage of the properties of the dynamic programming paradigm. Yates. R.A.B. and
Perleberg. C.H. (1996) have given an algorithm for approximate string matching that
runs in linear time for most practical cases. Yates. R.A.B. and Navarro. G. (1996)
have given a new algorithm for on-line approximate string matching. Their algorithm.
is based on the simulation of a non deterministic finite automaton built from the
pattern and using the text as input,
‘The next generalization to the string matching problem is to find the occurrences
of the multiple patterns Py, P,, Ps, .. , Pin the given text T, This problem is also
known as multiple pattern matching. Aho A. and Corasick M.J. (1975) have given an
algorithm to solve this problem. This algorithm has been used to improve the speed
24of a library bibliography search program by a factor of 5 to 10. The main idea of the
research is to build a finite state pattern matching machine for the multiple patterns
and the text is given as an input to this machine. ‘The machine signals whenever it
has found a match among the given patterns. Construction of this pattern matching
machine takes time proportional to the sum of length of the patterns:
Yates R.A.B. and Navarro G. (1996) have given two new algorithms for on-line
approximate multiple pattern matching, In the first approach, for searching multiple
pattern they have superimposed the automata and used the resultant as a filter. In
the second approach, the patterns are partitioned into sub-patterns and searching is,
done with no errors using a fast exact multi-pattern scarch algorithm. ‘The running
time achieved is O(n) in both cases for moderate error level, pattern length and
number of patterns,
In syntactic pattern matching, the text will be generated by a grammar G. For
one dimension, the grammar G is generally known as string grammars. The string
grammars are formally defined in chapter 2, Usually we consider four types of gram-
mars as defined by Chomsky. They are type-0 or phrase structured grammars, type-1
or context sensitive grammars, type-2 or context free grammars and type-3 or regular
grammars
‘The syntactic pattern matching approach has got advantages as well as disadvan-
tages. The formal grammars are used to represent the patterns. The production rules
describe how complex patterns or sub-patterns that can be built from the simpler pat
tems. Formal language theory is well studied and understood. It has got applications
in many fields of computer science which will be an added advantage for the syntactic
approach, There are some disadvantages in this approach, It may be very dificult to
represent complex pattems using this approach. ‘The lack of automatic inference and
learning procedures also puts some limitations on the syntactic approach,
‘Membership problem for a grammar is a widely studied subject. The membership
problem is defined as follows: Given a grammar G and a string uw, is w generated by
G. The membership problem for type-0 grammars is undecidable. For context sensi-
tive grammar it is PSPACE complete. Two well known algorithms for membership
problem for CFG are the CYK algorithm (Harrison M.A., 1978) and Earley’s algo-
rithm (Aho A., Ullman J.D., 1972). For regular grammars, the membership testing
is simple.
Myers E,W. and Miller W. (1989) have given an algorithm for the approximate
matching of regular expressions. Given a sequence A and regular expression R, the
problem is to find a sequence of matching R whose optimal alignment with A is the
highest scoring of all such sequences, An alignment between two sequences A =
A, and B= by.uby is a list of ordered pains < (iy, 1); (lasda)s-os (in Je) > such,
25that ig 1, w(k + 1) — w(k) < w(k) — w(k — 1).
‘Myers G. (1995) has given an algorithm for approximately matching context fre¢
languages. Mayers algorithm runs in O(PN?(N + logP)) time for approximately
matching a string of length N and a context - free language gencrated by a grammar
of size P. The algorithm generalizes the Cocke-Younger-Kasami (CYK) algorithm for
determining membership in a context-free language. The author has also considered
the problem where the gap costs are concave.
Osorio M. and Navarro J.A. (2004) have presented an algorithm which, given a
CFG and a string a, decides whether a string of the form Savy belongs to the language
or not. There is a wide variety of applications where deciding if a string belongs to
the defined language has relevance but, due to “visualization” limitations, it is not
always possible to explicitly have complete strings of data. The proposed solution
involves a set of equations able to compute the sets of non-terminal symbols that
can produce every substring, ‘The author has designed this algorithm with dynamic
programming methods, and it can solve the problem in polynomial time.
Reghizzi S. C. and Pradella M. (2008) have given a new parsing algorithm, which
extends the Cocke, Kasmi and Younger’s classical parsing technique for string lan-
guages and preserves polynomial time complexity. The author has also addressed
the problem of parsing pictures with Kolam grammars. To cope with sub-pictures,
instead of substrings, the algorithm adds two-dimensions to the Cocke, Kasmi and.
Younger recognition matrix (CKY). It works bottom up, and recognizes subpictures
as a result of the application of grammar rules, starting from atomic subpictures. Ad-
jacent parsed subpictures are then combined. The author prove that the algorithm
has polynomial time complexity. The result applies of course to CF matrix grammars
too, since their translation to Kolam form is straightforward.
Yong C.B. et al. (2011) have given an improved algorithm for the real-scale in
dexing problem. For constant-sized alphabets, preprocessing takes O (|1'|’) time and
chemo S
26space, achieving the answering time O(|P|+.w), where U, denotes the number of
matched positions and w < U,.. For the case of large-sized alphabets, preprocessing
can still be implemented with O (|T/*) time and space, while the answering tim
slightly increased to O (|P|+w +loglT|).
Lozano. HM, (2011) have given a variation of the Cocke-Younger-Kasami algo-
rithm (CYK algorithm) for the analysis of fuzzy free context languages applied to
DNA strings. The author has considered a variation of the original CYK algorithm.
where author prove that the computational order of the new CYK algorithm is O(n}.
‘The author prove that the new algorithm only uses O/2n) memory locations. The
fuzzy context-free grammar is obtained from the DNA. ‘The algorithm can be used to
find regulatory motifs among other applications,
Sharma P. et al. (2016) have given a technique consists of orderly application of
various mathematical transformations on one dimensional time series obtained from
a 2D image array. These transformations include array to time-series conversion,
local maxima detection-joining, and calculation of cumulative angle. ‘The final caleu-
lated parameter is a direct pointer to the image similarity. ‘The proposed technique
performed well against traditional image comparison techniques under specific cir
cumstances, The technique can be also used to identify similar patterns in a single
image
‘The natural generalization of the multiple pattern matching is the one in which
the pattern is a two dimensional array of symbols P of size my x mz and the text
T of size m x mo. The problem is to determine whether the pattern occurs as a sub
array of the text Tor not. Bird.R.S. (1977) has given the first algorithm to solve this
problem where he has assumed that the pattern P is of size m x m and text Tis of
size n x n. He used KMP as the base algorithm. ‘The algorithm runs in O (1+ m?)
time, uses O (12+ m) space.
Krithivasan, K, and Sitalakshmi, R, (1986) have given an improved algorithm to
that of Bird, R.S. (1977) and their algorithm nuns better when the alphabet size is
large. Yates. R. A. B. and Regnier. M. (1993) have given an algorithm that finds
the occurrences of the pattern P of size m x m in a text T of size n x n in expected
time of O ( 1?/m)
Karkkainen, J. and Ukkonen. B. (1994) have given an algorithm for two and higher
dimensional pattern matching in optional expected time. Their algorithm runs in O
(12 /n®loge m) (where cis the alphabet size) which is optimal by Yao’s lower bound
result (Yao. A.C. 1979). Krithivasan. K. and Sitalakshmi. R. (1987) have also given
an algorithm for 2-dimensional pattern matching in presence of errors. Here they
reported the occurrences of pattern P in the text T' allowing up to & mismatches.
Ranka. $. and Heywood. TT. (1991) have improved this algorithm. ‘The approximate
2two dimensional pattern matching problem has also been considered by Park. K.
(1996).
Polear T. and Melichar B. (2005) has given an algorithm that transforms a special
type of non-deterministic two-dimensional online tessellation automata into determin-
istic one is presented, This transformation is then used in a new general approach to
exact and approximate two-dimensional pattern matching based on two-dimensional
online tessellation automata that generalizes pattern matching approach based on
finite automata well known from one-dimensional case.
‘Zdarek J. and Melichar B. (2008) have given a general concept of two-dimensional
pattern matching using conventional (one-dimensional) finite automata, ‘Then two
particular models and methods, implementations of the general principle, are pre
sented. The first of these two models presents an automata based version of the Bird
and Baker approach with lower space complexity than the original algorithm. The
second introduces a new model for two-dimensional approximate pattern matching
using the two-dimensional Hamming distance.
‘Terrier.V. (2003) has considered the relationships between the classes of two-
dimensional languages defined by deterministic on-line tessellation automata and by
real time two-dimensional cellular automata with Moore and Von Neumann neigh-
bothood. ‘The algorithm generalizes the result, known for one-dimensional cellular
automata to two-dimensional cellular automata with Von Neumann neighborhood:
the class of real time cellular automata is closed under rotation of 180° if and only if
a real time cellular automaton is equivalent to linear time cellular automata.
Matrix grammar is studied in (Siromoney. G. Siromoney. R. and Krithivasan. K.
1972) and is used as a generation mechanism to generate rectangular arrays, With
this type of grammar, first a horizontal string of intermediates is derived and then
the vertical columns of the array are derived. In (Siromoney. G. Siromoney. R.
and Krithivasan, K. 1972), all four types of grammars of Chomsky hierarchy are
considered in the horizontal direction and only regular grammars are considered in
the vertical direction
Subramanian K.G. et al.(2008) have introduced a new theoretical model of gram-
matical picture generation called extended 2D context-free picture grammar(E2DCFPG)
generating rectangular picture arrays of symbols. ‘This model allows variables in the
grammar and uses the squeezing mechanism of forming the picture language over ter-
minal symbols, The extended picture grammar model E2DCFPG is shown to have
more picture generative power than the P2DCFPG and certain other existing 2D
models.
Nagar A.K, et al.(2016) have introduced a variant of extended two-dimensional
context-free picture grammar (E2DCFPG), called (I/u) E2DCEPG which allows
28rewriting only the leftmost column, or the uppermost row of variables in a picture
array, Several theoretical properties of (I/u) E2DCFPG have obtained and an appli-
cation in goncrating digitized picture arrays have discussed
Siromoney G. and Siromoney R. (1976) have introduced grammatical models for
generating hexagonal arrays. Properties of special classes of hexagons have been
studied with a new kind of catenation called arrowhead catenation is defined so that
the hexagonal shape is maintained in every generation. Perceptual twins of a given
set of blocks are obtained formally from the grammar generating the original set of
blocks.
Inoue K. and Nakamura A. (1977) has given a new type acceptor called the “two-
dimensional on line tessellation acceptor” (denoted by “2-ota”) and to examine several
properties of the 2-ota, The 2-ota might be considered as a real-time mode of rect
angular array bounded cellular space, introduced by Smith and Beyer. First, several
fundamental properties of the 2-ota has examined. The accepting power of the
has compared with that of the two-dimensional finite automaton, introduced by Blum,
and Hewitt. ‘The main results are: ‘The class of sets accepted by nondeterministic
2ota’s contains properly that of sets accepted by nondeterministic two-dimensional
finite automata, The class of sets accepted by deterministic 2-ota’s is incomparable
with that of sets accepted by deterministic (or nondeterministic) two-dimensional
finite automata.
Toda, M, Inoue, K, and Takanami, I. (1983) have given an algorithm that can be
solved array matching problem efficiently by using 2-dimensional online tessellation
acceptor. The array matching problem can be solved in exactly m-Hn-1 steps for m by
ntext arrays, Based on the ideas due to Baker, an application of the two - dimensional
on-line tessellation acceptor (2-dota) has presented for very rapid on-line detection of
of a fixed set of keyarrays as embedded subarrays in a text array. The
main part of the algorithm has described in the research consists of constructing two
finite state pattern (string) matching machines from keyarrays, By combining these
two finite state pattern matching machines, the author has constructed the 2-dota
Dersanambika K.S. et al. (2003) have introduced hexagonal Wang tiles, local
hexagonal picture languages and recognizable hexagonal picture languages. ‘The au-
thor use hexagonal wang tiles to introduce hexagonal wang systems, a formalism to
recognize hexagonal picture languages. It is noticed that the family of hexagonal
picture languages defined by hexagonal wang systems coincides with the family of
hexagonal picture languages recognized by hexagonal tiling system. Similar to hv-
domino systems, the author define xyz-domino systems and prove that recogni:
hexagonal picture languages are characterized as projections of xyz - local pi
languages. The author has also considered hexagonal pictures over one letter alpha
cota
29bet
Subramanian K,G, and Nagar A.K. (2008) have introduced pure 2D context-free
grammar, which genorates rectangular picture arrays of symbols. ‘The model is based.
on the notion of pure context-free grammars of formal string language theory. The
generative power of this model in comparison to certain other related models have
been examined. The author has also associated a regular control language with a pure
2D CFG and notice that the generative power increases. The author has obtained
Certain closure properties.
Subramanian K.G. et al, (2009) have introduced a new syntactic model, called
pure two-dimensional context free grammar (P2DCFG), has introduced based on
the notion of pure context-free string grammar, The rectangular picture generative
power of this 2D grammar model has investigated. Certain closure properties have
been obtained. An analogue of this 2D grammar model called pure 2D hexagonal
context-Lree grammar (P2DHCFG) has also considered to generate hexagonal picture
arrays on triangular grids,
Subramanian K.G. et al. (2010) have investigated the generative power of the
array-rewriting P system with pure 2D context-free rules by comparing it with other
2D grammar models, thus bringing out the suitability of this P system model for
picture array generation
Cece G. and Giorgetti. A. (2011) have introduced the notion of simulation over
a class of automata which recognize 2D languages (languages of arrays of letters)
‘This class of two-dimensional On-line ‘Tessellation Automata accepts the same class
of languages as the class of tiling systems, considered as the natural extension of
Classical regular word languages to the 2D case. The author has prove that simulation
over 20TTA implies language inclusion. Even if the existence of a simulation relation
between two 2OTA has shown to be a NP-complete problem in time, this is an
important result since the inclusion problem is undecidable in general in this class
of languages. ‘The author has also proved the existence of a unique maximal auto
simulation relation in a given 20TA and the existence of a unique minimal 20TA
which is simulation equivalent to this given 20TA, both computable in polynomial
time,
Bersani M.M. et al. (2013) have proved some closure properties of pure two-
dimensional context-free grammars, for which also prove that the parsing is NP-hard.
Some comparisons have been drawn with other interesting picture grammars like tile
gtammars, prusa grammars and local languages, clarifying, in some cases, their mu-
tual relationship with respect to expressiveness. The author has focused on studying
a very simple yet expressive non-isometric grammar formalism, called (regularly con-
trolled) pure two-dimensional context-frec grammars ((R)P2DCFG), which is defined
30by Subramanian et al. (2009). The grammars of this family are endowed with two
sorts of rewriting rules which are involved in the process of derivation of pictures
separately because they work cither on rows or columns. ‘Tables are sets of rules that
can be applied to rewrite rows or columns. Their application produces a new picture
from an existing one where either one row or one column at a time is replaced. When
a table is applied on a row (or column), then all symbols in it are rewritten at the
same time and the row (or column) is replaced by an array of at least one row (col-
umn), Although the nature of rules (which do not use non-terminal symbols) limits
the expressive power of the formalism, grammars can be endowed with a (regular)
control language which defines legal sequences of tables to be used in generating the
pictures. The resulting family of grammars is rather expressive since a set of sym-
bols can be possibly used as non-terminals which are then removed from pictures by
suitable sequences of tables that are controlled by the (control) language.
Kamraj T. and Thomas D.G (2013) have introduced an automata model for the
recognizability of hexagonal picture languages, called Hexagonal Wang Automata
(HWA), which is based on a variant of hexagonal Wang tiles. ‘The author provide wide
range of polite scanning strategies and prove that the non deterministic HWA with any
polite scanning strategy are equivalent to hexagonal tiling systems or hexagonal online
tessellation acceptors. ‘The author has also introduced the notion of deterministic
recognizability in HWA and present some comparison results.
Anitha, P, and Dersanambika.K.S. (2014) have introduced @ model of automaton
for hexagonal picture language recognition, which is based on tiles called Hexagonal
Wang Automaton. The hexagonal Wang automata combine features of both online
tessellation acceptors and 6- way automata,
Kamraj T. and Thomas D.G (2014) have introduced hexagonal picture generating
devices which extended the hexagonal array token Petri Net structures. ‘The author
has considered Adjunct Hexagonal Array Token Petri Net Structures (AHPN) model
along with a control feature called inhibitor arcs and compare it with some expressive
hexagonal picture generating and recognizing models with respect to the generating
power.
Krivka Z. et al. (2014) have Considered a large variety of approaches in gener-
ating picture languages, the notion of pure two-dimensional context-free grammar
(P2DCFG) represents a simple yet expressive non-isometric language generator of
picture arrays, The author has introduced a new variant of P2DCFGs that generates
picture arrays in a leftmost way. ‘The author has concentrated on determining their
generative power by comparing it with the power of other picture generators. The
author has also examined the power of these generators that regulate rewriting by
control languages. The author has discussed leftmost rewriting in terms of P2DCFG.
31In other words, while a P2DCFG allows rewriting any column or any row of a picture
array by the rules of an applicable column rule table or row rule table respectively, in
the variant under the investigation in the prosent rescarch, only the leftmost column,
or the uppermost row of an array is rewritten, The author has refered to the P2DCFG
working under this derivation mode as (I/u) P2DCFG and the corresponding family of
picture languages generated by them as (1/u)P2DCFL. ‘The author has demonstrated
that (//u)P2DCFL and the family of picture languages generated by P2DCFGs are
incomparable, and that (I/u) P2DCFL is not closed under union and intersection,
The effect of regulated rewriting in (I/u) P2DCKGs by control languages has also
examined, and it has demonstrated that this regulation results into an increase in the
generative power
Subramanian K.G. et al. (2015) have introduced another variant of P2DCFG
that corresponds to rightmost rewriting in string context-free grammars. ‘The new
grammar is called (r/d) P2DCFG and rewrites in parallel all the symbols only in the
rightmost column or the lowermost row of a picture array by a set of context-free
rules. Unlike the case of string context-free grammars, the picture language families
of P2DCFG and the two variants (I/u) P2DCFG and (r/d) P2DCFG are mutually
incomparable, although they are not disjoint. The author has also examined the effect
of regulating the rewriting in a (r/d) P2DCFG by suitably adapting two well-known
control mechanisms in string grammars, namely, control words and matrix control.
Sujathakumari K, and Dersanambika KS, (2016) have introduced different types
of hexagonal array grammar system and the generative capacities of these grammar
systems are compared according to the number of components and models of deriva-
tion, The author observed that the difference in the generative capacity is based on
the fundamental difference between regular and context free array grammars. The
author has introduced cooperating distributed hexagonal array grammar system and
the generative power of the system.
G, Samdaniclthompson et al, (2017) have introduced a grammar system, called
alphabetic flat splicing pure context-free grammar system (AFSPCFGS), as a new
model of language generation, based on the operation of alphabetic flat splicing on
words and pure context-free rules. The author has derived certain comparison results
that bring out the generative power of AFSPCFGS and as an application construct
a AFSPCFGS generating “floor-design” pictures
‘There are many algorithms for syntactic pattern matching and they considered
rectangular pattern array. Till now there is no research has been done in designing
the pure context free grammar for generating a set of hexagonal image patterns. This,
research makes use of pure context free grammar to generate hexagonal patterns,
which will be recognized by a hexagonal online tessellation acceptor.
32