Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/233801608

A Top-Down Chart Parser for Analyzing Arabic Sentences

Article  in  International Arab Journal of Information Technology · May 2012

CITATIONS READS

28 1,695

3 authors:

Dr. Ahmad T. Al-Taani Mohammed Msallam


Yarmouk University Al-Aqsa University
39 PUBLICATIONS   338 CITATIONS    4 PUBLICATIONS   43 CITATIONS   

SEE PROFILE SEE PROFILE

Sana Wedian
University of Ottawa
3 PUBLICATIONS   52 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Sana Wedian on 16 May 2014.

The user has requested enhancement of the downloaded file.


The International Arab Journal of Information Technology, Vol. 9, No. 3, March 2012

A Top-Down Chart Parser for Analyzing


Arabic Sentences
Ahmad Al-Taani1, Mohammed Msallam2, and Sana Wedian1
1
Department of Computer Science, Yarmouk University, Jordan
2
Department of Computer Science, Al-Aqsa University, Palestine

Abstract: Parsing of Arabic sentences is a necessary mechanism for many natural language processing applications such as
machine translation; question answering, knowledge extraction and information retrieval. In this study, we present a top-down
chart parser for parsing simple Arabic sentences, including nominal and verbal sentences within specific domain Arabic
grammar. We used the Context Free Grammar (CFGs) to represent the Arabic grammar. We first developed the Arabic
grammar rules that give precise description of grammatical sentences. Then, we implemented the parser that assigns
grammatical structure to the input sentence. The parser is tested on sentences extracted from real documents. Experimental
results showed the effeteness of the proposed top-down chart parser for parsing modern standard Arabic sentences. From a
practical perspective, the parser is able to satisfy syntactic constraints and reduce parsing ambiguity.

Keywords: NLP, Parser, chart parsing, top-down chart parser, context free grammar, syntactic structures.

Received February 14, 2010; accepted August 10, 2010

1. Introduction 4. The presence of an elliptic personal pronoun


“alDamiir Almustatir”.
Arabic is the fourth most widely spoken language in
the world. It belongs to the Semitic family of An important method for natural language parsing is
languages which differs from Indo-European called Chart Parsing. It consists of a tabular-based, top-
languages in terms of its syntax, semantic and down parsing algorithm [8]. The basic idea is to obtain
morphology. Although different spoken Arabic dialects a table that contains all substructures generated during
exist throughout the Arab world, there is only one form the parsing process through different iterations until all
of the written language found in printed works, and it possible structures for the sentence are obtained. The
is known as Standard Arabic [15]. chart parser takes as an input a sentence which consists
Parsing is defined as the process of identifying the of a set of words and grammar rules that are referred to
structure of a specific sentence according to a given as production rules.
grammar. The term parser is used in cases where the In recent years, research in the field of Natural
sentences are made up of information units of any Language Processing (NLP) systems has witnessed a
kind. significant improvement in an attempt to develop
Parsing Arabic sentences is a difficult task. The various approaches for parsing different languages. For
difficulty is due to the following reasons: first, the Indo-European languages, specific parsing approaches
average length of an Arabic sentence is 20 to 30 words, had been explored in depth and had emphasized some
and in some sentences, the number of words exceeds trends such as lexical function grammars, deterministic
100. Therefore, Arabic sentences, by nature, are long parsing, and closer integration of syntax and semantics
and complex. Second, the Arabic sentence is [10]. As a result, NLP systems for Indo-European
syntactically ambiguous and complex due to the systems had gained strength and power.
frequent usage of grammatical relations, order of In this research, we present a top-down chart parser
words and phrases, conjunctions, and other for parsing simple Arabic sentences, including nominal
constructions such as diacritics (vowels), which is and verbal sentences within specific domain Arabic
known in written Arabic as “altashkiil” [15]. Parsing grammar. Context Free Grammar (CFG) is used to
Arabic sentences is a difficult task due to the following represent the Arabic grammar. CFGs Grammars are
reasons [12]: consisting entirely of rules with a single symbol on the
left-hand side, called the mother. CFGs are a very
1. The length of the Arabic sentences and the important class of grammars for two reasons [11]:
complexity of the Arabic syntax.
2. The omission of diacritics (vowels) in written • The formalism is powerful enough to describe most
Arabic “altashkiil”. of the structure in natural languages.
3. The free word order nature of Arabic sentences.
The International Arab Journal of Information Technology, Vol. 9, No. 3, March 2012

• Yet it is restricted enough so that efficient parsers sentences randomly selected from a corpus of news
can be built to analyze sentences. articles; he claimed a performance of 92%.
Bataineh et al. [5] implemented a top-down parser
We first developed the Arabic grammar rules that give
with recursive transition network for the analyzing
precise description of grammatical sentences. Then, we
Arabic sentences. According to the authors, the system
implemented the parser that assigns grammatical
is tested on 77 sentences and gave a performance of
structure to the input sentence.
85.6%.
The rest of this paper is organized as follows.
Daoud [7] presents a lossless compression algorithm
Section 2 reviews some important related work.
based on the affix analysis that takes advantage of the
Section 3 presents the proposed scheme. Section 4
statistical studies of the diacritical Arabic
discuses the experimental results of the system.
morphological features.
Finally, we present the conclusions drawn from this
Othman et al. [12] used Unification Based Grammar
study in section 5.
(UBG) formalism to write the Arabic grammar rules
using a bottom-up chart parser. The grammar consists
2. Related Work of 170 rules. The rules are divided into 22 groups of
For the last two decades concentration on Arabic rules each of which represents a grammatical category
language processing has focused on morphological such as: object, subject, defined, conjunction form,
analysis. In this field, many working systems have substitution form etc., each grammar rule has the form:
been achieved [3, 4, 7, 10, 14] and many others. rule (LHS, RHS):- constrains. The grammar rules
In contrast, there were less works reported on encode the syntactic and the semantic constrains that
syntactic analysis of Arabic. This is due to challenging help in resolving the ambiguity of parsing Arabic
features of the Arabic language such as high degree of sentences.
ambiguity, complex Arabic syntax and absence of Shaalan et al. [15] developed an Arabic Parser for
regular punctuation. Some progress has been made in modern scientific text. The parser is written in Definite
recent years [2, 5, 9, 12, 13, 14, 15, 16], but there is Clause Grammar (DCG) and is implemented to be part
still no general parser available for Arabic with of a machine translation system. The development
sufficiently wide coverage. At present, no analyzer process of the authors’ Arabic parser involved two
seems to be able to analyze ordinary real-world Arabic steps. In the first step, all rules that make up the Arabic
texts. Most systems simply select types of syntactic grammar and that give a precise account for a sentence
phenomena for treatment, with considerable lexical to be grammatically correct are acquired. In the second
limitations. However, real world texts like article from step, the parser that assigns grammatical structure into
newspaper, abstract from scientific journals or web input sentence was implemented.
pages usually contain all sorts of sentences which Tounsi et al. [16, 17] presented a method for parsing
cause problems for parsers in assigning a suitable Arabic sentences using Treebank-based parsers and
structure [13]. automatic LFG f-structure annotation methodologies
For Arabic language, the development of Arabic developed by Bikel’s [6]. The modified approach
parsing system focused mainly on the analysis of learned ATB functional tags and merge phrasal
Arabic morphology [9]. Rafea et al. [14] analyzed and categories with functional tags in the training data.
discussed the problem of implementing a Most of the related work reported in this study
morphological analyzer for inflected Arabic words. To concentrated on short sentences and used hand-crafted
implement their Arabic parser, they took the advantage grammars, which are time-consuming to produce and
of the already developed morphological analyzer by difficult to scale to unrestricted data. Also, these
integrating it with the Arabic parser. approaches used traditional parsing techniques like
Al-Daoud et al. [2] proposed a framework to top-down and bottom-up parsers demonstrated on
automate the parsing of Arabic sentences. The study simple verbal sentences or nominal sentences with
focused on the simple verbal sentences. The proposed short lengths.
approach is divided into two phases; lexical analysis
and syntax analysis. The proposed system assumes that 3. The Proposed Approach
the entered sentences are correct lexically and
The main goal of the proposed parser is to provide a
grammatically.
computerized system to parse Arabic sentences. In this
Attia [3, 4] investigates different methodologies to
section, we present the architecture of the system, and
manage the problem of morphological and syntactic
discuss the main parts that constitute the proposed top-
ambiguities in Arabic. He built an Arabic parser using
down chart parser. The proposed top-down chart
Xerox linguistics environment which allows writing
parsing scheme consists of three main steps: word
grammar rules and notations that follow the LFG
classification, Arabic grammar identification and
formalisms. Attia tested his approach on short
parsing. The architecture of the system is given in
A Top-Down Chart Parser for Analyzing Arabic Sentences

Figure 1. In this figure, the arrows indicate the flow of 1. Nominal Sentences: The nominal sentence is
information. Boxes are the modules of the system. defined as a sentence that begins with a noun. The
In word classification task, we use the three main parts of such types of sentences are the inchoative
categories that are used in Arabic language to ‫أ‬ylzZ^‫( ا‬al-mubtada') and the predicate oz|^‫( ا‬al-
distinguish between words. These categories are: Khabar). Copulas or action-verbs may be used
Nouns, verbs and particles. In Arabic language, a noun freely in this type of sentences. We have used the
is a word that describes a person, a thing, or an idea. following rules to identify the nominal sentences:
Arabic verbs are similar to those in English. Although
NS  NP NP
the tenses and aspects are different, the verb tenses can
NS  NP NS
be classified as present, past and imperative. The
NS  NP VP
particles are classified as prepositions, adverbs,
NS  NP VS
conjunctions, interrogative particles, exceptions, and
interjections. 2. Verbal Sentences: The verbal sentence is the
sentence that begins with a verb [3]. It has the
Grammar Rule Arabic Grammar following structure: verb + subject + accusative
object. We have used the following rules to identify
Input Sentence verbal sentences:
Parser VS  VP NP
Yes/no & Word Cat.
Lexicon VS  VP NS
3. Domain Words: In addition to the two main classes
Word of sentences (nominal and verbal), the system takes
into consideration the existence of particular domain
Word words that uniquely identify the structure of the
Classifier Word
Word Cat. sentence. Domain words include verbs (‫ل‬Y‫ـــ‬iK€‫)ا‬,
pronouns (o‚YZƒ^‫ )ا‬and particles.
Final Chart
There are two main types of verbs, the intransitive verb
Figure 1. The architecture of the Arabic chart parser (adapted from ‫] ا^†زِم‬ik^‫( ا‬Al fialu al-lāzim) and the transitive verb
[7]).
‫ي‬yilZ^‫] ا‬ik^‫ا‬. Intransitive verb takes only a subject ]ˆ ِ Yk^‫ا‬,
In Arabic language, words are usually formed as a as in the sentence : y^َrَ ^‫ء ا‬Y‰ (the boy came). Transitive
sequence of (antefix, prefix, core, suffix, postfix). In verb ‫ي‬yilZ^‫] ا‬ik^‫( ا‬Al fialu al-mutaaddī) takes a subject
the proposed chart parser, we classify words into nouns ]ِˆYk^‫ ا‬and an accusative object, aِ `ِ ‫ل‬riْkZَ ^‫ا‬, as in the
or verbs based on their affixes and some other rules. sentence: ‫رس‬y^‫Œ ا‬ZŽl^‫ ا‬l‫( آ‬the student wrote the lesson).
Table 1 shows some of the affixes that we have used in It is important to illustrate here that the system has
our work. Moreover, we have used more rules of the ability to identify the structure of a sentence if it
patterns that clearly distinguish words from nouns [1]. has one verb or two non-consecutive verbs, like the
Examples on these rules are mentioned in Table 2. sentence: l‫ء ا^Œي آ‬Y‰. In addition, the system can
identify verbs of the present or past tenses.
Table 1. The used prefixes and suffixes. The system can identify two main types of
Class Prefixes Suffixes pronouns, the independent pronouns Ž‘kuZ^‫ ا‬o‚YZƒ^‫( ا‬al-
,IJK ،MJK ،NJK ،IO ،MO ،NO ،‫ وا‬،IV ،WX ،YZX ،W[ ،‫ ي‬،‫ون‬ damā'ir Al-munfasila), that are written as a standalone
Verb
‫ س‬،‫ ت‬،‫ ي‬،‫ ن‬،‫ا‬ ‫ ن‬،‫ان‬ words and cover only subject functions, and the linked
،aX ،bcX ،bdX ،IX ،YdX ، eX ، WcX ، ‫اء‬ pronouns Ž‘lZ^‫ ا‬o‚YZƒ^‫( ا‬al-damā'ir al-muttasila) that
Noun ‫ ال‬،]^ ،‫ل‬Y‫ آ‬،‫ل‬YK ،‫ل‬Y`
‫ ات‬،‫ ة‬،YZcX
are always written attached to the end of the word they
Table 2. Pattern rules used to classify words. refer to, and are used as personal pronouns subject,
Class Pattern Prefix infix Examples
object and also have a possessive function. For
،bnolO‫ا‬،‫دع‬rlO‫ا‬ example, bd`Yl‫( آ‬their book) (plural masculine) =
Verb ]
َ iَ kْ lَ O
ْ‫إ‬ MْO‫إ‬ -
WJslO‫ ا‬،tlulO‫ا‬ Kitabuhum, YZc`Yl‫( آ‬your book) (dual male or female) =
،‫م‬YnolO‫ ا‬،‫داع‬rlO‫ا‬ Kitabukumaa. The Arabic independent and linked
Noun ‫ل‬
َ Yَikْ lَ O
ْ‫إ‬ MْO‫إ‬ ‫ا‬
‫ن‬YJslO‫ ا‬،‫ج‬YlulO‫ا‬
pronouns are illustrated in Tables 3 and 4, respectively.
The system can identify the structure of sentence
For the sake of Arabic grammar Identification, we
that contains different types of particles, these include:
used the CFGs to represent the structure of sentences
in terms of what phrases are subparts of others. We • Conjunctions between two pronouns or two nouns.
classify the rules of grammar into two categorizes For example: (and: wa ‫)و‬, (then: thoma b“), (but: bl
according to the type of Arabic sentence: ]`), (or: ao ‫ أو‬, am ‫)أم‬, (excluding: ”– ‫ – ف‬Wc^ -
—ln).
The International Arab Journal of Information Technology, Vol. 9, No. 3, March 2012

• Exceptive particles, such as: But: (ada ‫ا‬yˆ, shoa From this figure, the user can either choose to
‫ى‬rO, ila ”‫)إ‬, (rather than: gher o›), (except: hasha automatically categorize the components of the
YœYn), (khala †). inserted sentence, or to do this task manually. If the
• Particles which introduce the present verb, such as: user chooses the automatic categorization, he must
(not: Ln W^, Lm b^, Lma YZ^), (to: Ki I‫)آ‬, (shall press “Categorization” button and the words categories
“saofa” ‫ف‬rO, seen WO), (edn: ‫)إذن‬. will be displayed in the “Word Categorization” pane,
• Prepositions which introduce the noun, such as: as shown in Figure 3. If automatic categorization fails,
(with: ma’a, Ÿ ), (on: ala —Žˆ), (to: ila —^‫)ا‬, (in: fi however, the user can press “Manual Cat” button to
I
ِ K), (about: an Wˆ). manually insert the correct category as illustrated in
Figure 4.
Table 3. Independent pronouns.

Pronouns
He huwa r‫ه‬ I Ana YV‫أ‬
She heya I‫ه‬ We nahnu WsV
They humā YZ‫ه‬ You anta MV‫أ‬
Nominative
They Hum b‫ه‬ You antumā YZlV‫أ‬
You
They Hunna W‫ه‬ antum blV‫أ‬
(pl.)
You antuna WlV‫أ‬
‫ي‬Y[‫ –إ‬YVY[‫ك –إ‬Y[‫ –إ‬YZ‫آ‬Y[‫ –إ‬b‫آ‬Y[‫ –إ‬W‫آ‬Y[‫ –إ‬£Y[‫ –إ‬Y‫ه‬Y[‫ إ‬-
Accusative
YZ‫ه‬Y[‫ –إ‬b‫ه‬Y[‫ إ‬- W‫ه‬Y[‫إ‬ Figure 3. Word categorization.
Table 4. Linked pronouns.
Pronouns
The Imperfect verb with ( you
masculine dual, they masc. ‫ــ‬JZ|^‫ل ا‬Y‫ــ‬iK€‫ا‬
Dual, you masc. Plural, they ]ˆYk^‫ء ا‬YX - YV -Wu“”‫ ا‬¥^‫ أ‬-ˆYZ¤^‫واو ا‬
masc. Plural, you feminine – z¦Y|Z^‫ء ا‬Y[ -
dual)

Possessive pronouns bŽclZ^‫ء ا‬Y[– ¦Y|Z^‫ف ا‬Y‫‚ – آ‬Y§^‫ء ا‬Y‫ه‬

3.1. System’s Main Functions


3.1.1. Words Categorization Figure 4. Manual categorization.
The system provides the user with the ability to
perform both an automatic categorization and manual 3.1.2. The Parser
categorization. The selection of either method depends After that the user can use the parser where he must
on the extent to which the automatic categorization press the “Run Parser” button. Figure 5 shows each
succeeds. That is, the user might initially use the step for parsing the sentence to identify its structure.
automatic categorization to break the sentence down
into its main components. However, if the automatic
categorization failed to find the correct components,
then the user can perform the manual categorization, in
which he/she will be to insert the component type (i.e.,
PRO, N, ART, etc.,) manually. To clarify these two
methods, we have initially inserted the sentence: ‫ و‬YV‫أ‬
(‫أي‬o^‫ ا‬IK ‫ن‬Y¨kl  MV‫)أ‬, as shown in Figure 2.

Figure 5. Parsing sentence.

3.1.3. Final Chart


Finally, after the parsing process is completed, the user
can view the final chart of the inserted sentence. Figure
5 shows the parsing process for the nominal sentence
“‫أي‬o^‫ ا‬IK ‫ن‬Y¨kl  MV‫ وأ‬YV‫”أ‬. The final chart for this sentence
Figure 2. Parser interface.
A Top-Down Chart Parser for Analyzing Arabic Sentences

is shown in Figure 6. For purpose of explanation, the Comparison of related work was a difficult task in
final chart for the given sentence is also given in this study because most of the reported related work
Figure 7. did not present any experimental results. Only three
related work could be compared partially to our
approach.
Bataineh and Bataineh [5] reported that 85.6% of 90
sentences were parsed successfully using their top-
down parser, Tounisi et al. [16] reported about 77%
parsing accuracy on parsing Arabic sentences using
Treebank-based corpus. Ouersighni [13] has used 105
Arabic sentences to test the morphological analyser
which gave an accuracy of about 89%, but he didn’t
report any results for the parser. In contrast, we
reported an average accuracy of 94.3% using more
Figure 6. The chart for “‫أي‬o^‫ ا‬IK ‫ن‬Y¨kl  MV‫ و أ‬YV‫”أ‬. efficient parsing method, top-down chart parser. The
only previous approach similar to our approach is the
NS3 (rule 6 with NP2 & NS2) work presented by Othman et al. [12] which used
NS2 (rule 3 with NP3 & NP4) bottom-up chart parsing, but the authors didn’t present
NP4 (rule 14 with any experimental results in their study; instead, they
ARTN1 & N2)
NS1 (rule 3 with NP2 & NP3)
explained the bottom-up parser on a simple Arabic
NP3 sentence of length 3 words.
(rule 12)
NP2 (rule 17 with PRO1 &
CONJ! & PRO2) 5. Conclusions and Future Work
NP1
(rule 13) In this work, we have presented an efficient top-down
PRO 1 CONJ 1 PRO 2 N1 ARTN 1 N2 chart parser for parsing simple Arabic sentences. We
1 YV‫أ‬ 2 ‫و‬ 3 MV‫أ‬ 4 ‫ن‬Y¨kl  5 IK 6 ‫أي‬o^‫ا‬ have used CFG to represent the Arabic grammar. We
Figure 7. The chart for “‫أي‬o^‫ ا‬IK ‫ن‬Y¨kl  MV‫ و أ‬YV‫”أ‬. depended solely on Arabic grammar rules in parsing to
determine the structure of sentence within specific
4. Experimental Results domain of Arabic grammar. The grammar rules encode
the syntactic and the semantic constrains that help in
The parser is tested on 70 sentences extracted from resolving the ambiguity of parsing Arabic sentences.
Arabic documents. Sentences have different sizes from The proposed parsing technique will provide
2 to 6 words. The performance of the system was very promising impact on many language applications such
good in all experiment scenarios for the various sizes as question answering and machine translation,
of sentences. Table 5 shows the performance of the because the source sentences will be analyzed
parsing experiments while the results for word according to the grammar rules that represent their
classification according to their affixes and other rules intended meaning. Thus, reduces syntactic and
are presented in Table 6. semantic ambiguity.
After the review of related work in this study, we
Table 5. The results of the parsing sentences.
have drawn the following conclusions:
Type Total Correct %
1. Most of reported related work used traditional
Nominal Sentence 36 35 97.2 % parsing techniques like top-down and bottom-up
parsers with different methods for representing the
Verbal Sentence 34 31 91.2 % grammar like recursive transition networks and
GFS. These methods are not robust enough for
Total 70 66 94.3 %
natural languages like the Arabic language.
Table 6. The results of the word classification. 2. Most of these approaches used simple verbal
sentences or nominal sentences with short lengths.
Type Total Correct %
Noun 175 159 90.9%
Despite all the above facts, our proposed top-down
chart parser has the following advantages over existing
Verb 42 39 92.9%
approaches:
Pronoun 42 36 85.7%
Article (Noun) 45 42 93.3% • It analyses both Arabic nominal and verbal
Article (Verb) 2 2 100%
sentences regardless the length of the sentence. The
grammar and the Arabic sentences used in this study
Conjunction 7 7 100%
Total 313 285 91%
The International Arab Journal of Information Technology, Vol. 9, No. 3, March 2012

are presented in Appendix A and Appendix B, [8] Dick G. and Ceriel H., “Parsing Techniques, a
respectively. Practical Guide,” Technical Report, England,
• It uses efficient parsing techniques, i.e., a top-down 1990.
chart parser. [9] Ditters E., “A Formal Grammar for the
• Experimental results showed the effectiveness of the Description of Sentence Structure in Modern
system for analysing both verbal and nominal Standard Arabic,” in Proceedings of Arabic NLP
sentences. Workshop at ACL/EACL, Ouersighni, pp. 44-46,
2001.
Future work will focus on the following related issues: [10] Feddag A, “Arabic Morpho-Syntax and Semantic
• Identifying passive or active voice verbs since in Parsing,” in Proceedings of the 3rd International
this research we did not take diacritics (vowels) into Conference and Exhibition on Multi-lingual
consideration. Computing, UK, pp. 125-129, 1992.
• Identifying more than one linked pronoun in the [11] James A., Natural Language Understanding,
word, as in the verb “a‫ـ‬lŽ`Yª” contains two linked Benjamin/Cummings, CA, 1995.
pronouns, the first is “]ˆYk^‫ء ا‬YX”, and the second is [12] Othman E., Shaalan K., and Rafea A., “A Chart
“‚Y§^‫ء ا‬Y‫”ه‬. Parser for Analyzing Modern Standard Arabic
• Identifying the existence of two consecutive Sentence,” in Proceedings of the MT Summit IX
prepositions as in the sentence: «‫ ا‬yuˆ W  ]‫] آ‬ª. Workshop on Machine Translation for Semitic
• In addition, our system cannot identify particles that Languages: Issues and Approaches, USA, pp.
are used to introduce two present verbs in the 33-39, 2003.
sentence such as (W – Y  – YZd  – —u  – ‫ن‬Y[‫ – أ‬W[‫– أ‬ [13] Ouersighni R., “Towards Developing A Robust
YZu[‫ –أ‬IV‫ – أ‬YZ¬n - YZk‫)آ‬. This will be considered for Large-Scale Parser for Arabic Sentences,” in
future work. Proceedings of the International Arab
Conference on Information Technology, Tunisia,
pp. 15-18, 2008.
References [14] Rafea A. and Shaalan K., “Lexical Analysis of
[1] Abuleil S. and Evens M., “Discovering Lexical Inflected Arabic Words Using Exhaustive Search
Information by Tagging Arabic Newspaper of an Augmented Transition Network,”
Text,” Computer Journal of Computers and the Computer Journal of Software Practice and
Humanities, vol. 36, no. 2, pp. 191-221, 1998. Experience, vol. 23, no. 6, pp. 567-588, 1993.
[2] Al-Daoud E. and Basata A., “A Framework to [15] Shaalan K., Farouk A., Rafea A., “Towards an
Automate the Parsing of Arabic Language Arabic Parser for Modern Scientific Text,” in
Sentences,” Computer Journal of The Proceeding of the 2nd Conference on Language
International Arab Journal of Information Engineering, Egyptian Society of Language
Technology, vol. 6, no. 2, pp. 191-195, 2009. Engineering, Egypt, pp. 103-114, 1999.
[3] Attia M., “An Ambiguity-Controlled [16] Tounsi L., Attia, M., and Genabith J., “Parsing
Morphological Analyzer for Modern Standard Arabic Using Treebank-Based LFG Resources,”
Arabic Modeling Finite State Networks,” in in Proceedings of the LFG09 Conference,
Proceedings of The Challenge of Arabic for Cambridge, pp. 583-586, 2009.
NLP/MT Conference, London, pp. 155-159, [17] Tounsi L., Attia M., and Genabith J., “Automatic
2006. Treebank-Based Acquisition of Arabic LFG
[4] Attia M., “Handling Arabic Morphological and Dependency Structures,” in Proceedings
Syntactic Ambiguity within the LFG Framework European Chapter of the Association for
with a View to Machine Translation,” PhD Computational Linguistics, EACL Workshop
Thesis, University of Manchester, UK, 2008. Computational Approaches to Semitic
[5] Bataineh B. and Bataineh E., “An Efficient Languages, Athens, pp. 45-52, 2009.
Recursive Transition Network Parser for Arabic
Language,” in Proceedings of the World
Congress on Engineering WCE, UK, pp. 124-
127, 2009.
[6] Bikel D., “Intricacies of Collins Parsing Model,”
Computer Journal of Computational Linguistics,
vol. 30, no. 4, pp. 253-259, 2004.
[7] Daoud A., “Morphological Analysis and
Diacritical Arabic Text Compression,” Computer
Journal of the International Journal of ACM
Jordan, vol. 1, no. 1, pp. 41-47, 2010.
A Top-Down Chart Parser for Analyzing Arabic Sentences

Appendix A: Grammar Rules ‫ة‬yn‫ وا‬O‫ر‬y  [o¨^Y` .35


‫ح‬Y¤u^‫ ا‬IK ] €‫ت ا‬Yd‫ه‬ .36
1. VS --> VP NP ‫ن‬Ysl ”‫ ا‬¤lV ‫ر إˆ†ن‬o¨X .37
2. VS --> VP NS
W[odœ yi` ¤lu^‫ ا‬WŽiX ‫ف‬rO .38
3. NS --> NP NP
‫ك‬r ols[ ‫س‬Yu^‫م ا‬oln‫ا‬ .39
4. NS --> NP VP
ˆY¤º` µ|^‫ ا‬bŽcX .40
5. NS --> NP VS
‫اء‬oZs^‫ ا‬nYkl^‫ ا‬y[¯  ]‫أآ‬ .41
6. NS --> NP NS osz^‫ ا‬W  »[o§^‫ذ ا‬Y¨V¼ Mˆoœ .42
7. VP --> ARTV V1
Y‫ره‬Y‫[¨ أزه‬ys^‫ ا‬IK Iuz¤i[ .43
8. VP --> V
ª‫ر‬r^‫ ˆŽ— ا‬l‫آ‬ .44
9. VP --> V1
‫د‬o§X ‫را‬r¦ ŸZO .45
10. VP --> V P
WsV ”‫ إ‬NKYc[ b^ .46
11. VP --> V1 P o§‘^‫[» ا‬o§^‫ذ ا‬Y¨V¼ Mˆoœ .47
12. NP --> N
‫ور‬²^‫دة ا‬Ydœ ‫ل‬r¨^‫½ ا‬³[ .48
13. NP --> PRO
‫رس‬y^‫ ا‬YV‫أت أ‬oª .49
14. NP --> ARTN N
O‫ر‬yZ^‫ إ^— ا‬¥Or[ ‫ و‬y[‫ذه ز‬ .50
15. NP --> ARTN PRO
”‫ز‬YuK ‫ر‬Yu[y` ‫ب‬r¬^‫ ا‬olœ‫ا‬ .51
16. NP --> N CONJ N
‫ء‬YiZ‰ YdŽ‫[ آ‬o¨^‫ ا‬M¾¨lO‫ا‬ .52
17. NP --> PRO CONJ PRO
n‫ا‬o^‫ء ا‬Y§l`” MJŽ‰ .53
18. NP --> N CONJ PRO ½Zº^‫ع ا‬Yiœ ‫م‬r§^‫د ا‬oX .54
19. NP --> PRO CONJ N
bd`‫ن إ” دوا‬rz‚Y§^‫د ا‬Yˆ .55
µ £r`‫ ا^Œي أ‬IV‫زار‬ .56
Appendix B: Sentences Used For Testing [oª ‫ت‬rZ^‫ أن ا‬MZŽˆ .57
Z‫ و‬alzªYˆ ]d¤^‫ا‬ .1 ‫ري‬Y‰ I‫ن أ‬Y‫أه‬ .58
o‚Y¦ ‫ة‬o¤œ ]‫ق آ‬rK .2 o¨k^‫ك ا‬r‫ أ‬YJ‫آ‬ .59
Wu ¯Z^‫ل ا‬Y‘ W  oz‘^‫إن ا‬ .3 Y`‫و‬yu  ‫ن‬ri`‫ أر‬oZX¯Z^‫ ا‬oƒn .60
bŽiZ^‫ ا‬W  ªYi  ‫ك‬r‫أ‬ .4 yziV ‫ك‬Y[‫إ‬ .61
‫ت‬Yd €‫ام ا‬yª‫ أ‬MsX u¤^‫ا‬ .5 ‫ن‬r NZ^‫ ا‬b“ yœo^‫ت ا‬Y  .62
‫ء ا€دب‬YJ^ ‫ب‬Y¨i^‫” ا‬r^ .6 o‘  IK ²[YK bŽiX .63
YZŽª ‫ن‬r“†“ ‫ي‬yuˆ .7 †‘k  oz|^‫ا ا‬yZs  ‫ت‬NzV‫أ‬ .64
yk  auc^ ±‫ب ر‬Ylc^‫ا‬ .8 I·YZ^‫م ا‬Yi^‫ ا‬IK y^Y ‫ت‬Y  .65
‫ون‬²‚Yk^‫ ا‬b‫ ه‬e³^‫أو‬ .9 Y¨¦YV ‫ن‬YJV¼‫Ž» ا‬ .66
b Œ^‫ˆ— ا‬oV ‫ب‬oi^‫ ا‬WsV .10 ‫آ‬oº^‫ ا‬o[y  MŽ`Yª .67
zsZ^‫ ا‬W  ]zV‫ƒŽ أ‬K ” .11 ‫راة‬YzZ^‫ ا‬IK Yu¨[oK ‫ز‬YK .68
‫م‬r¨^‫ ا‬IK ]ªYˆ Y  .12 YV‫ر‬Y‰ ‫رة‬YO M[‫رأ‬ .69
‫ب‬Yz^‫ة ˆŽ— ا‬o¨K ‫أة‬o ‫ر‰] و ا‬ .13 W¨l  ]Zˆ ]Zˆ .70
Yu¨[y´ oZO ‫ن‬Y‫آ‬ .14
‫ر‬Yi^‫ ا‬YdlzªYˆ VY|^‫إن ا‬ .15 Ahmad Al-Taani is an associate
Iª†‫ هŒب أ‬l‫آ‬ .16 professor of artificial intelligence at
‫م‬r^‫دم ا‬Yª o› y[‫ز‬ .17 Yarmouk University, Jordan. He
µ XY‫ آ‬oˆYœ ‫ˆ†ء‬ .18 received his PhD in computer vision
o¤k^‫ŽŸ ا‬µ  —ln I‫†م ه‬O .19 from University of Dundee, UK in
‫أي‬o^‫ ا‬IK ‫ن‬Y¨kl  MV‫ و أ‬YV‫أ‬ .20 1994. He was the dean of the
‫ء‬Yu`€‫¯و^ ا‬J  ‫ء‬Y`¶‫ˆŽ— ا‬ .21 Faculty of Science and Information
‫د‬rZs  ‫ و‬yZs  ‫ ذه‬O‫ر‬yZ^‫إ^— ا‬ .22 Technology at Jadara University during his sabbatical
‫ة‬rk‫ ه‬b^Yˆ ]c^ .23 leave in 2008. In 2009, he is appointed as the Vice
Yd‫آ‬rŽJ^ ‫ا‬o[y¨X `rzs  ‫ة‬Ylk^‫ا‬ .24 Dean of the Faculty of Information Technology and
bno^‫ر ا‬rk§^‫ ا‬r‫ه‬ .25 Computer Science at Yarmouk University. He has
Žn o ‫Ž ا€دب‬n .26 several publications in refereed journals and
‫ل‬yi^Y` ‫ن‬YZcs[ ‫ن‬Y·Y¨^‫ا‬ .27 conference proceedings. His research interest includes:
]Zi^‫ ا‬aOYO‫ح أ‬Y¤u^‫ا‬ .28 artificial intelligence applications, pattern recognition,
]‰‫ ر‬Yul` IK .29 image processing, and Arabic language processing.
y[yœ £‫ؤ‬r· ‫ح‬Yz‘Z^‫إن ا‬ .30
²[²ˆ ‫ي‬r¨^ «‫إن ا‬ .31
o|` blV‫م و أ‬Yˆ ]‫آ‬ .32
‫م‬²‫و ه‬yi^‫ا‬ .33
‫ة‬oz ^Yµ^‫ ا‬W  o¬‫رس أآ‬yZ^‫ا‬ .34
The International Arab Journal of Information Technology, Vol. 9, No. 3, March 2012

Mohammed Msallam received his Sana Wedian is a lecturer at the


BSc in educational computer from Department of Computer Science at
Al-Aqsa University, Gaza in 2002, Yarmouk University. She received
and MSc in computer science from her MSc in computer science in 2009
Yarmouk University, Jordan in from Jordan University of Science
2009. Currently, he is a lecturer at and Technology. Her research
the department of computer science interest includes: Arabic language
at Al-Aqsa University. His research interest includes: processing, evolutionary computing, and web spam
evolutionary computation, swarm intelligence, and detection.
Arabic language processing.

View publication stats

You might also like