Professional Documents
Culture Documents
Dukes 11 PGR
Dukes 11 PGR
Kais Dukes
I-AIBS Institute for Artificial Intelligence and Biological Systems School of Computing University of Leeds
David
Jesus
1000 BCE?
1 CE
The Quran
Muhammad (PBUH)
610-632 CE
- Orthography (diacritics and vowelization) - Etymology (Semitic roots) - Morphology (derivation and inflection) - Syntax (origins of dependency grammar) - Discourse Analysis & Rhetoric - Semantics & Pragmatics
Morphology (word structure) - Arabic is highly inflected, challenging to analyze Grammar - Phrase structure, dependency
Semantics Ontology of Entities and Concepts referred to by pronouns and nouns
Google Search for verse (68:38) on Jan 21, 2008 shows many typos
- Manually verified to 100% accuracy by a group of experts who have memorized the entire text of the Quran
Not manually verified Authors reports an F-measure of 86% Non-standard annotation scheme not familiar to traditional Arabic linguists e.g. extracting a list of all verbs is non-trivial Arabic text is only encoded phonetically instead of using the original Arabic. e.g. searching for a specific root is not easy
- Common standard for verses (Chapter:Verse) - Extended in the QAC corpus to include word numbers and segment numbers, e.g. (21:70:4:2)
- Division of a single word into multiple segments - Part-of-speech tag assigned to each segment - Traditional Arabic Grammar rules used for division
- Artificial Intelligence and Computational Linguistics - Arabic linguistics -Quranic and Islamic Studies -Classical literature analysis -Anyone who wants to appreciate the Quran
- Machine-Learning parser
-Part-of-speech tags adapted from Traditional Arabic Grammar, and mapped to English equivalents (not the other way around) - These tags apply to words in the Quran, as well as to individual morphological segments in the text
Automatic Annotation
Classical Arabic Dependency Parser
- Joakim Nivre (2009) dependency parsing using a shift/reduce queue/stack architecture with machine learning
- Following similar architecture, but with hand written rules, custom parser has an F-measure of 77.2%
Conclusion This is not the end to come: 2nd half of PhD project; and more?
Kais Dukes
I-AIBS Institute for Artificial Intelligence and Biological Systems School of Computing University of Leeds