Download as pdf
Download as pdf
You are on page 1of 9
Chapter 3 Review of Literature There are many well-known string matching algorithms for one dimensional text like Knuth Morris Pratt and Boyer Moore. ‘There is also a string matching algorithm, that constructs a finite automaton for the pattern and finds all the occurrences of the pattern in the text. There are many variations to this string matching problem. One such variation is to find the occurrences of the pattern P in the text 7’ allowing mis- matches. This problem is also called as approximate string matching. ‘This problem can be defined as follows: Given a text T of length n, a pattern P of length m and an integer &, to find all occurrences of the pattem P in the text T with at most & mismatches. Landau. GM. and Vishkin. U. (1986) have given an algorithm which runs in O(k (mlogm+n)) time. Landau. GM, and Vishkin. U, (1988 ) have given an algorithm which runs in O(m-+nk?) time for an alphabet set whose size is fixed. For general input, the algorithm runs in O(rmlogm + nk) time. In both the cases, the space requirement is O(m) Several algorithms have been presented for approximate string matching that use O(kn}Ofkn) comparisons in the worst case or in the average case by taking advan- tage of the properties of the dynamic programming paradigm. Yates. R.A.B. and Perleberg. C.H. (1996) have given an algorithm for approximate string matching that runs in linear time for most practical cases. Yates. R.A.B. and Navarro. G. (1996) have given a new algorithm for on-line approximate string matching. Their algorithm. is based on the simulation of a non deterministic finite automaton built from the pattern and using the text as input, ‘The next generalization to the string matching problem is to find the occurrences of the multiple patterns Py, P,, Ps, .. , Pin the given text T, This problem is also known as multiple pattern matching. Aho A. and Corasick M.J. (1975) have given an algorithm to solve this problem. This algorithm has been used to improve the speed 24 of a library bibliography search program by a factor of 5 to 10. The main idea of the research is to build a finite state pattern matching machine for the multiple patterns and the text is given as an input to this machine. ‘The machine signals whenever it has found a match among the given patterns. Construction of this pattern matching machine takes time proportional to the sum of length of the patterns: Yates R.A.B. and Navarro G. (1996) have given two new algorithms for on-line approximate multiple pattern matching, In the first approach, for searching multiple pattern they have superimposed the automata and used the resultant as a filter. In the second approach, the patterns are partitioned into sub-patterns and searching is, done with no errors using a fast exact multi-pattern scarch algorithm. ‘The running time achieved is O(n) in both cases for moderate error level, pattern length and number of patterns, In syntactic pattern matching, the text will be generated by a grammar G. For one dimension, the grammar G is generally known as string grammars. The string grammars are formally defined in chapter 2, Usually we consider four types of gram- mars as defined by Chomsky. They are type-0 or phrase structured grammars, type-1 or context sensitive grammars, type-2 or context free grammars and type-3 or regular grammars ‘The syntactic pattern matching approach has got advantages as well as disadvan- tages. The formal grammars are used to represent the patterns. The production rules describe how complex patterns or sub-patterns that can be built from the simpler pat tems. Formal language theory is well studied and understood. It has got applications in many fields of computer science which will be an added advantage for the syntactic approach, There are some disadvantages in this approach, It may be very dificult to represent complex pattems using this approach. ‘The lack of automatic inference and learning procedures also puts some limitations on the syntactic approach, ‘Membership problem for a grammar is a widely studied subject. The membership problem is defined as follows: Given a grammar G and a string uw, is w generated by G. The membership problem for type-0 grammars is undecidable. For context sensi- tive grammar it is PSPACE complete. Two well known algorithms for membership problem for CFG are the CYK algorithm (Harrison M.A., 1978) and Earley’s algo- rithm (Aho A., Ullman J.D., 1972). For regular grammars, the membership testing is simple. Myers E,W. and Miller W. (1989) have given an algorithm for the approximate matching of regular expressions. Given a sequence A and regular expression R, the problem is to find a sequence of matching R whose optimal alignment with A is the highest scoring of all such sequences, An alignment between two sequences A = A, and B= by.uby is a list of ordered pains < (iy, 1); (lasda)s-os (in Je) > such, 25 that ig 1, w(k + 1) — w(k) < w(k) — w(k — 1). ‘Myers G. (1995) has given an algorithm for approximately matching context fre¢ languages. Mayers algorithm runs in O(PN?(N + logP)) time for approximately matching a string of length N and a context - free language gencrated by a grammar of size P. The algorithm generalizes the Cocke-Younger-Kasami (CYK) algorithm for determining membership in a context-free language. The author has also considered the problem where the gap costs are concave. Osorio M. and Navarro J.A. (2004) have presented an algorithm which, given a CFG and a string a, decides whether a string of the form Savy belongs to the language or not. There is a wide variety of applications where deciding if a string belongs to the defined language has relevance but, due to “visualization” limitations, it is not always possible to explicitly have complete strings of data. The proposed solution involves a set of equations able to compute the sets of non-terminal symbols that can produce every substring, ‘The author has designed this algorithm with dynamic programming methods, and it can solve the problem in polynomial time. Reghizzi S. C. and Pradella M. (2008) have given a new parsing algorithm, which extends the Cocke, Kasmi and Younger’s classical parsing technique for string lan- guages and preserves polynomial time complexity. The author has also addressed the problem of parsing pictures with Kolam grammars. To cope with sub-pictures, instead of substrings, the algorithm adds two-dimensions to the Cocke, Kasmi and. Younger recognition matrix (CKY). It works bottom up, and recognizes subpictures as a result of the application of grammar rules, starting from atomic subpictures. Ad- jacent parsed subpictures are then combined. The author prove that the algorithm has polynomial time complexity. The result applies of course to CF matrix grammars too, since their translation to Kolam form is straightforward. Yong C.B. et al. (2011) have given an improved algorithm for the real-scale in dexing problem. For constant-sized alphabets, preprocessing takes O (|1'|’) time and chemo S 26 space, achieving the answering time O(|P|+.w), where U, denotes the number of matched positions and w < U,.. For the case of large-sized alphabets, preprocessing can still be implemented with O (|T/*) time and space, while the answering tim slightly increased to O (|P|+w +loglT|). Lozano. HM, (2011) have given a variation of the Cocke-Younger-Kasami algo- rithm (CYK algorithm) for the analysis of fuzzy free context languages applied to DNA strings. The author has considered a variation of the original CYK algorithm. where author prove that the computational order of the new CYK algorithm is O(n}. ‘The author prove that the new algorithm only uses O/2n) memory locations. The fuzzy context-free grammar is obtained from the DNA. ‘The algorithm can be used to find regulatory motifs among other applications, Sharma P. et al. (2016) have given a technique consists of orderly application of various mathematical transformations on one dimensional time series obtained from a 2D image array. These transformations include array to time-series conversion, local maxima detection-joining, and calculation of cumulative angle. ‘The final caleu- lated parameter is a direct pointer to the image similarity. ‘The proposed technique performed well against traditional image comparison techniques under specific cir cumstances, The technique can be also used to identify similar patterns in a single image ‘The natural generalization of the multiple pattern matching is the one in which the pattern is a two dimensional array of symbols P of size my x mz and the text T of size m x mo. The problem is to determine whether the pattern occurs as a sub array of the text Tor not. Bird.R.S. (1977) has given the first algorithm to solve this problem where he has assumed that the pattern P is of size m x m and text Tis of size n x n. He used KMP as the base algorithm. ‘The algorithm runs in O (1+ m?) time, uses O (12+ m) space. Krithivasan, K, and Sitalakshmi, R, (1986) have given an improved algorithm to that of Bird, R.S. (1977) and their algorithm nuns better when the alphabet size is large. Yates. R. A. B. and Regnier. M. (1993) have given an algorithm that finds the occurrences of the pattern P of size m x m in a text T of size n x n in expected time of O ( 1?/m) Karkkainen, J. and Ukkonen. B. (1994) have given an algorithm for two and higher dimensional pattern matching in optional expected time. Their algorithm runs in O (12 /n®loge m) (where cis the alphabet size) which is optimal by Yao’s lower bound result (Yao. A.C. 1979). Krithivasan. K. and Sitalakshmi. R. (1987) have also given an algorithm for 2-dimensional pattern matching in presence of errors. Here they reported the occurrences of pattern P in the text T' allowing up to & mismatches. Ranka. $. and Heywood. TT. (1991) have improved this algorithm. ‘The approximate 2 two dimensional pattern matching problem has also been considered by Park. K. (1996). Polear T. and Melichar B. (2005) has given an algorithm that transforms a special type of non-deterministic two-dimensional online tessellation automata into determin- istic one is presented, This transformation is then used in a new general approach to exact and approximate two-dimensional pattern matching based on two-dimensional online tessellation automata that generalizes pattern matching approach based on finite automata well known from one-dimensional case. ‘Zdarek J. and Melichar B. (2008) have given a general concept of two-dimensional pattern matching using conventional (one-dimensional) finite automata, ‘Then two particular models and methods, implementations of the general principle, are pre sented. The first of these two models presents an automata based version of the Bird and Baker approach with lower space complexity than the original algorithm. The second introduces a new model for two-dimensional approximate pattern matching using the two-dimensional Hamming distance. ‘Terrier.V. (2003) has considered the relationships between the classes of two- dimensional languages defined by deterministic on-line tessellation automata and by real time two-dimensional cellular automata with Moore and Von Neumann neigh- bothood. ‘The algorithm generalizes the result, known for one-dimensional cellular automata to two-dimensional cellular automata with Von Neumann neighborhood: the class of real time cellular automata is closed under rotation of 180° if and only if a real time cellular automaton is equivalent to linear time cellular automata. Matrix grammar is studied in (Siromoney. G. Siromoney. R. and Krithivasan. K. 1972) and is used as a generation mechanism to generate rectangular arrays, With this type of grammar, first a horizontal string of intermediates is derived and then the vertical columns of the array are derived. In (Siromoney. G. Siromoney. R. and Krithivasan, K. 1972), all four types of grammars of Chomsky hierarchy are considered in the horizontal direction and only regular grammars are considered in the vertical direction Subramanian K.G. et al.(2008) have introduced a new theoretical model of gram- matical picture generation called extended 2D context-free picture grammar(E2DCFPG) generating rectangular picture arrays of symbols. ‘This model allows variables in the grammar and uses the squeezing mechanism of forming the picture language over ter- minal symbols, The extended picture grammar model E2DCFPG is shown to have more picture generative power than the P2DCFPG and certain other existing 2D models. Nagar A.K, et al.(2016) have introduced a variant of extended two-dimensional context-free picture grammar (E2DCFPG), called (I/u) E2DCEPG which allows 28 rewriting only the leftmost column, or the uppermost row of variables in a picture array, Several theoretical properties of (I/u) E2DCFPG have obtained and an appli- cation in goncrating digitized picture arrays have discussed Siromoney G. and Siromoney R. (1976) have introduced grammatical models for generating hexagonal arrays. Properties of special classes of hexagons have been studied with a new kind of catenation called arrowhead catenation is defined so that the hexagonal shape is maintained in every generation. Perceptual twins of a given set of blocks are obtained formally from the grammar generating the original set of blocks. Inoue K. and Nakamura A. (1977) has given a new type acceptor called the “two- dimensional on line tessellation acceptor” (denoted by “2-ota”) and to examine several properties of the 2-ota, The 2-ota might be considered as a real-time mode of rect angular array bounded cellular space, introduced by Smith and Beyer. First, several fundamental properties of the 2-ota has examined. The accepting power of the has compared with that of the two-dimensional finite automaton, introduced by Blum, and Hewitt. ‘The main results are: ‘The class of sets accepted by nondeterministic 2ota’s contains properly that of sets accepted by nondeterministic two-dimensional finite automata, The class of sets accepted by deterministic 2-ota’s is incomparable with that of sets accepted by deterministic (or nondeterministic) two-dimensional finite automata. Toda, M, Inoue, K, and Takanami, I. (1983) have given an algorithm that can be solved array matching problem efficiently by using 2-dimensional online tessellation acceptor. The array matching problem can be solved in exactly m-Hn-1 steps for m by ntext arrays, Based on the ideas due to Baker, an application of the two - dimensional on-line tessellation acceptor (2-dota) has presented for very rapid on-line detection of of a fixed set of keyarrays as embedded subarrays in a text array. The main part of the algorithm has described in the research consists of constructing two finite state pattern (string) matching machines from keyarrays, By combining these two finite state pattern matching machines, the author has constructed the 2-dota Dersanambika K.S. et al. (2003) have introduced hexagonal Wang tiles, local hexagonal picture languages and recognizable hexagonal picture languages. ‘The au- thor use hexagonal wang tiles to introduce hexagonal wang systems, a formalism to recognize hexagonal picture languages. It is noticed that the family of hexagonal picture languages defined by hexagonal wang systems coincides with the family of hexagonal picture languages recognized by hexagonal tiling system. Similar to hv- domino systems, the author define xyz-domino systems and prove that recogni: hexagonal picture languages are characterized as projections of xyz - local pi languages. The author has also considered hexagonal pictures over one letter alpha cota 29 bet Subramanian K,G, and Nagar A.K. (2008) have introduced pure 2D context-free grammar, which genorates rectangular picture arrays of symbols. ‘The model is based. on the notion of pure context-free grammars of formal string language theory. The generative power of this model in comparison to certain other related models have been examined. The author has also associated a regular control language with a pure 2D CFG and notice that the generative power increases. The author has obtained Certain closure properties. Subramanian K.G. et al, (2009) have introduced a new syntactic model, called pure two-dimensional context free grammar (P2DCFG), has introduced based on the notion of pure context-free string grammar, The rectangular picture generative power of this 2D grammar model has investigated. Certain closure properties have been obtained. An analogue of this 2D grammar model called pure 2D hexagonal context-Lree grammar (P2DHCFG) has also considered to generate hexagonal picture arrays on triangular grids, Subramanian K.G. et al. (2010) have investigated the generative power of the array-rewriting P system with pure 2D context-free rules by comparing it with other 2D grammar models, thus bringing out the suitability of this P system model for picture array generation Cece G. and Giorgetti. A. (2011) have introduced the notion of simulation over a class of automata which recognize 2D languages (languages of arrays of letters) ‘This class of two-dimensional On-line ‘Tessellation Automata accepts the same class of languages as the class of tiling systems, considered as the natural extension of Classical regular word languages to the 2D case. The author has prove that simulation over 20TTA implies language inclusion. Even if the existence of a simulation relation between two 2OTA has shown to be a NP-complete problem in time, this is an important result since the inclusion problem is undecidable in general in this class of languages. ‘The author has also proved the existence of a unique maximal auto simulation relation in a given 20TA and the existence of a unique minimal 20TA which is simulation equivalent to this given 20TA, both computable in polynomial time, Bersani M.M. et al. (2013) have proved some closure properties of pure two- dimensional context-free grammars, for which also prove that the parsing is NP-hard. Some comparisons have been drawn with other interesting picture grammars like tile gtammars, prusa grammars and local languages, clarifying, in some cases, their mu- tual relationship with respect to expressiveness. The author has focused on studying a very simple yet expressive non-isometric grammar formalism, called (regularly con- trolled) pure two-dimensional context-frec grammars ((R)P2DCFG), which is defined 30 by Subramanian et al. (2009). The grammars of this family are endowed with two sorts of rewriting rules which are involved in the process of derivation of pictures separately because they work cither on rows or columns. ‘Tables are sets of rules that can be applied to rewrite rows or columns. Their application produces a new picture from an existing one where either one row or one column at a time is replaced. When a table is applied on a row (or column), then all symbols in it are rewritten at the same time and the row (or column) is replaced by an array of at least one row (col- umn), Although the nature of rules (which do not use non-terminal symbols) limits the expressive power of the formalism, grammars can be endowed with a (regular) control language which defines legal sequences of tables to be used in generating the pictures. The resulting family of grammars is rather expressive since a set of sym- bols can be possibly used as non-terminals which are then removed from pictures by suitable sequences of tables that are controlled by the (control) language. Kamraj T. and Thomas D.G (2013) have introduced an automata model for the recognizability of hexagonal picture languages, called Hexagonal Wang Automata (HWA), which is based on a variant of hexagonal Wang tiles. ‘The author provide wide range of polite scanning strategies and prove that the non deterministic HWA with any polite scanning strategy are equivalent to hexagonal tiling systems or hexagonal online tessellation acceptors. ‘The author has also introduced the notion of deterministic recognizability in HWA and present some comparison results. Anitha, P, and Dersanambika.K.S. (2014) have introduced @ model of automaton for hexagonal picture language recognition, which is based on tiles called Hexagonal Wang Automaton. The hexagonal Wang automata combine features of both online tessellation acceptors and 6- way automata, Kamraj T. and Thomas D.G (2014) have introduced hexagonal picture generating devices which extended the hexagonal array token Petri Net structures. ‘The author has considered Adjunct Hexagonal Array Token Petri Net Structures (AHPN) model along with a control feature called inhibitor arcs and compare it with some expressive hexagonal picture generating and recognizing models with respect to the generating power. Krivka Z. et al. (2014) have Considered a large variety of approaches in gener- ating picture languages, the notion of pure two-dimensional context-free grammar (P2DCFG) represents a simple yet expressive non-isometric language generator of picture arrays, The author has introduced a new variant of P2DCFGs that generates picture arrays in a leftmost way. ‘The author has concentrated on determining their generative power by comparing it with the power of other picture generators. The author has also examined the power of these generators that regulate rewriting by control languages. The author has discussed leftmost rewriting in terms of P2DCFG. 31 In other words, while a P2DCFG allows rewriting any column or any row of a picture array by the rules of an applicable column rule table or row rule table respectively, in the variant under the investigation in the prosent rescarch, only the leftmost column, or the uppermost row of an array is rewritten, The author has refered to the P2DCFG working under this derivation mode as (I/u) P2DCFG and the corresponding family of picture languages generated by them as (1/u)P2DCFL. ‘The author has demonstrated that (//u)P2DCFL and the family of picture languages generated by P2DCFGs are incomparable, and that (I/u) P2DCFL is not closed under union and intersection, The effect of regulated rewriting in (I/u) P2DCKGs by control languages has also examined, and it has demonstrated that this regulation results into an increase in the generative power Subramanian K.G. et al. (2015) have introduced another variant of P2DCFG that corresponds to rightmost rewriting in string context-free grammars. ‘The new grammar is called (r/d) P2DCFG and rewrites in parallel all the symbols only in the rightmost column or the lowermost row of a picture array by a set of context-free rules. Unlike the case of string context-free grammars, the picture language families of P2DCFG and the two variants (I/u) P2DCFG and (r/d) P2DCFG are mutually incomparable, although they are not disjoint. The author has also examined the effect of regulating the rewriting in a (r/d) P2DCFG by suitably adapting two well-known control mechanisms in string grammars, namely, control words and matrix control. Sujathakumari K, and Dersanambika KS, (2016) have introduced different types of hexagonal array grammar system and the generative capacities of these grammar systems are compared according to the number of components and models of deriva- tion, The author observed that the difference in the generative capacity is based on the fundamental difference between regular and context free array grammars. The author has introduced cooperating distributed hexagonal array grammar system and the generative power of the system. G, Samdaniclthompson et al, (2017) have introduced a grammar system, called alphabetic flat splicing pure context-free grammar system (AFSPCFGS), as a new model of language generation, based on the operation of alphabetic flat splicing on words and pure context-free rules. The author has derived certain comparison results that bring out the generative power of AFSPCFGS and as an application construct a AFSPCFGS generating “floor-design” pictures ‘There are many algorithms for syntactic pattern matching and they considered rectangular pattern array. Till now there is no research has been done in designing the pure context free grammar for generating a set of hexagonal image patterns. This, research makes use of pure context free grammar to generate hexagonal patterns, which will be recognized by a hexagonal online tessellation acceptor. 32

You might also like