Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Arabic Language Processing From

Theory to Practice 7th International


Conference ICALP 2019 Nancy France
October 16 17 2019 Proceedings Kamel
Smaïli
Visit to download the full and correct content document:
https://textbookfull.com/product/arabic-language-processing-from-theory-to-practice-7
th-international-conference-icalp-2019-nancy-france-october-16-17-2019-proceedings
-kamel-smaili/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Arabic Language Processing From Theory to Practice 6th


International Conference ICALP 2017 Fez Morocco October
11 12 2017 Proceedings 1st Edition Abdelmonaime Lachkar

https://textbookfull.com/product/arabic-language-processing-from-
theory-to-practice-6th-international-conference-icalp-2017-fez-
morocco-october-11-12-2017-proceedings-1st-edition-abdelmonaime-
lachkar/

Testing Software and Systems 31st IFIP WG 6 1


International Conference ICTSS 2019 Paris France
October 15 17 2019 Proceedings Christophe Gaston

https://textbookfull.com/product/testing-software-and-
systems-31st-ifip-wg-6-1-international-conference-
ictss-2019-paris-france-october-15-17-2019-proceedings-
christophe-gaston/

ICT Innovations 2019 Big Data Processing and Mining


11th International Conference ICT Innovations 2019
Ohrid North Macedonia October 17 19 2019 Proceedings
Sonja Gievska
https://textbookfull.com/product/ict-innovations-2019-big-data-
processing-and-mining-11th-international-conference-ict-
innovations-2019-ohrid-north-macedonia-
october-17-19-2019-proceedings-sonja-gievska/

Communication Technologies for Vehicles: 14th


International Workshop,
Nets4Cars/Nets4Trains/Nets4Aircraft 2019, Colmar,
France, May 16–17, 2019, Proceedings Benoît Hilt
https://textbookfull.com/product/communication-technologies-for-
vehicles-14th-international-workshop-nets4cars-nets4trains-
nets4aircraft-2019-colmar-france-may-16-17-2019-proceedings-
Service Oriented Computing 17th International
Conference ICSOC 2019 Toulouse France October 28 31
2019 Proceedings Sami Yangui

https://textbookfull.com/product/service-oriented-computing-17th-
international-conference-icsoc-2019-toulouse-france-
october-28-31-2019-proceedings-sami-yangui/

Proceedings of the 6th International Conference and


Exhibition on Sustainable Energy and Advanced Materials
ICE SEAM 2019 16 17 October 2019 Surakarta Indonesia
Ubaidillah Sabino
https://textbookfull.com/product/proceedings-of-the-6th-
international-conference-and-exhibition-on-sustainable-energy-
and-advanced-materials-ice-
seam-2019-16-17-october-2019-surakarta-indonesia-ubaidillah-
sabino/
Software Technology Methods and Tools 51st
International Conference TOOLS 2019 Innopolis Russia
October 15 17 2019 Proceedings Manuel Mazzara

https://textbookfull.com/product/software-technology-methods-and-
tools-51st-international-conference-tools-2019-innopolis-russia-
october-15-17-2019-proceedings-manuel-mazzara/

Algorithmic Decision Theory 6th International


Conference ADT 2019 Durham NC USA October 25 27 2019
Proceedings Saša Peke■

https://textbookfull.com/product/algorithmic-decision-theory-6th-
international-conference-adt-2019-durham-nc-usa-
october-25-27-2019-proceedings-sasa-pekec/

Multimodal Brain Image Analysis and Mathematical


Foundations of Computational Anatomy 4th International
Workshop MBIA 2019 and 7th International Workshop MFCA
2019 Held in Conjunction with MICCAI 2019 Shenzhen
China October 17 2019 Proceedings Dajiang Zhu
https://textbookfull.com/product/multimodal-brain-image-analysis-
and-mathematical-foundations-of-computational-anatomy-4th-
international-workshop-mbia-2019-and-7th-international-workshop-
Kamel Smaïli (Ed.)

Communications in Computer and Information Science 1108

Arabic Language
Processing
From Theory to Practice
7th International Conference, ICALP 2019
Nancy, France, October 16–17, 2019
Proceedings
Communications
in Computer and Information Science 1108
Commenced Publication in 2007
Founding and Former Series Editors:
Phoebe Chen, Alfredo Cuzzocrea, Xiaoyong Du, Orhun Kara, Ting Liu,
Krishna M. Sivalingam, Dominik Ślęzak, Takashi Washio, Xiaokang Yang,
and Junsong Yuan

Editorial Board Members


Simone Diniz Junqueira Barbosa
Pontifical Catholic University of Rio de Janeiro (PUC-Rio),
Rio de Janeiro, Brazil
Joaquim Filipe
Polytechnic Institute of Setúbal, Setúbal, Portugal
Ashish Ghosh
Indian Statistical Institute, Kolkata, India
Igor Kotenko
St. Petersburg Institute for Informatics and Automation of the Russian
Academy of Sciences, St. Petersburg, Russia
Lizhu Zhou
Tsinghua University, Beijing, China
More information about this series at http://www.springer.com/series/7899
Kamel Smaïli (Ed.)

Arabic Language
Processing
From Theory to Practice
7th International Conference, ICALP 2019
Nancy, France, October 16–17, 2019
Proceedings

123
Editor
Kamel Smaïli
University of Lorraine
Nancy, France

ISSN 1865-0929 ISSN 1865-0937 (electronic)


Communications in Computer and Information Science
ISBN 978-3-030-32958-7 ISBN 978-3-030-32959-4 (eBook)
https://doi.org/10.1007/978-3-030-32959-4

© Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

This book constitutes the refereed proceedings of the 7th International Conference on
Arabic Language Processing ICALP 2019 (ex-CITALA), held in Nancy, France, in
October 2019. The 21 full papers presented were carefully reviewed and selected from
about 40 submissions. 60% of the articles have been reviewed by three reviewers and
the others by four reviewers using the double-blind review process.
The conference highlighted new approaches related to the Arabic language from
basic theories to applications. All the branches of natural language processing
(NLP) related to Arabic spoken or text language processing constituted the main kernel
of ICALP 2019. The papers covered the following topics: modern standard Arabic,
sentiment analysis and opinions mining, code-switching, deep learning, and other
aspects of NLP. The volume is organized in four parts: the first is devoted to Arabic
dialects and sentiment analysis, an area for which there are several challenges; the
second part contains papers focusing on neural techniques for text and speech; the third
part comprises papers describing different aspects of modeling modern standard
Arabic; and the last one is dedicated to the resources that play an important role in
NLP.

October 2019 Kamel Smaïli


Organization

General Chair
Kamel Smaïli University of Lorraine, France

Program Committee Chair


Kamel Smaïli University of Lorraine, France

Steering Committee
Karima Abidi University of Lorraine, France
Karim Bouzoubaa University of Mohammed V, Morocco
Olivia Brenner University of Lorraine, France
Joseph Di Martino University of Lorraine, France
Abdelhamid EL Jihad University of Mohammed V, Morocco
Abdelfettah Hamdani University of Mohammed V, Morocco
Abdelmonaime Lachkar Université Sidi Mohamed Ben Abdellah, Morocco
David Langlois University of Lorraine, France
Azzedine Lazrek Université Cadi Ayyad, Morocco
Abdelhak Lekhouaja Université Mohammed Ier, Morocco
Azzedine Mazroui Université Mohammed Ier, Morocco
Mohamed-Amine Menacer University of Lorraine, France

Program Committee
Ahmed Ali Qatar Computing Research Institute (QCRI), Qatar
Hassina Aliane Centre de Recherche sur l’Information Scientifique
et Technique, Algeria
Frédéric Béchet University of Aix Marseille Université, France
Almoataz Bellahal-Said Cairo University, Egypt
Laurent Besacier University of Grenoble, France
Karim Bouzoubaa University of Mohammed V, Morocco
Violetta Cavalli-Sforza Al Akhawayn University, Morocco
Khalid Choukri European Language Resource Association (ELRA),
France
Kareem Darwish Qatar Computing Research Institute, Qatar
Joseph Di Martino Loria, University of Lorraine, France
Mona Diab George Washington University, USA
Abdelhamid El Jihad University of Mohammed V, Morocco
Mahmoud El-Haj School of Computing and Communications Lancaster
University, UK
viii Organization

Yannick Estève LIUM University of Maine, France


Albert Gatt Institute of Linguistics and Language Technology,
University of Malta, Malta
Nada Ghneim Higher Institute for Applied Science and Technology,
Syria
Ahmed Guessoum University of Science and Technology Houari
Boumediene, Algeria
Hatem Haddad Université Libre de Bruxelles, Belgium
Abdelfetah Hamdani University of Mohammed V, Morocco
Yannis Haralambous Institut Mines-Télécom, UMR CNRS 6285
Lab-STICC, France
Salima Harrat École Normale Supérieure Bouzaréah, Algeria
Mark Hasegawa-Johnson University of Illinois, USA
Mustafa Jarrar Birzeit University, Sina Institute, Palestine
Denis Jouvet Loria, Inria, France
Abdelmonaime Lachkar Université Sidi Mohamed Ben Abdellah, Morocco
David Langlois Loria, University of Lorraine, France
Abdelhak Lakhouaja Université Mohammed Ier, Morocco
Azzedine Lazrek Université Cadi Ayyad, Morocco
Yves Lepage University of Waseda, Japan
Azzedine Mazroui Université Mohammed Ier, Morocco
Karima Meftouh University of Annaba, Algeria
Farid Meziane University of Salford, UK
Vito Pirreli Institute for Computational Linguistics CNR, Italy
Violaine Prince LIRMM, France
Allan Ramsay School of Computer Science,
University of Manchester, UK
Horacio Rodríguez Universitat Politécnica de Catalunya, Spain
Paolo Rosso Universidad Politécnica de Valencia, Spain
Motaz Saad University Islamic of Gaza, Palestine
Fatiha Sadat University of Québec, Montréal, Canada
Khaled Shaalan The British University, UAE
Olivier Siohan Google, USA
Kamel Smaïli Loria, University of Lorraine, France
Adnan Yahya Birzeit University, Palestine
Abdella Yousfi Mohammed V University, Morocco
Wajdi Zaghouani Carnegie Mellon University, Qatar
Imed Zitouni Microsoft Research, USA
Organization ix

Sponsors

Google
University of Lorraine
Loria: Laboratoire lorrain de recherche en informatique et ses applications
ELRA: European Language Resources Association
SIGUL: Special Interest Group: Under-resourced Languages
IDMC: Institut des Sciences Digitales
OLKI: Open Language and Knowledge for Citizens
ALESM: The Arabic Language Engineering Society in Morocco
Contents

Arabic Dialects and Sentiment Analysis

Generating a Lexicon for the Hijazi Dialect in Arabic . . . . . . . . . . . . . . . . . 3


Fatimah Abdullah Alqahtani and Mark Sanderson

Natural Arabic Language Resources for Emotion Recognition


in Algerian Dialect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Habiba Dahmani, Hussein Hussein, Burkhard Meyer-Sickendiek,
and Oliver Jokisch

An Empirical Evaluation of Arabic-Specific Embeddings


for Sentiment Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Amira Barhoumi, Nathalie Camelin, Chafik Aloulou, Yannick Estève,
and Lamia Hadrich Belguith

A Fine-Grained Multilingual Analysis Based on the Appraisal Theory:


Application to Arabic and English Videos . . . . . . . . . . . . . . . . . . . . . . . . . 49
Karima Abidi, Dominique Fohr, Denis Jouvet, David Langlois,
Odile Mella, and Kamel Smaïli

Neural Techniques for Text and Speech

Extractive Text-Based Summarization of Arabic Videos: Issues,


Approaches and Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Mohamed Amine Menacer, Carlos-Emiliano González-Gallardo,
Karima Abidi, Dominique Fohr, Denis Jouvet, David Langlois,
Odile Mella, Fatiha Sadat, Juan-Manuel Torres-Moreno,
and Kamel Smaïli

Automatic Identification Methods on a Corpus of Twenty Five


Fine-Grained Arabic Dialects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Salima Harrat, Karima Meftouh, Karima Abidi, and Kamel Smaïli

Aggregation of Word Embedding and Q-learning for Arabic


Anaphora Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Saoussen Mathlouthi Bouzid and Chiraz Ben Othmane Zribi

LSTM-CNN Deep Learning Model for Sentiment Analysis


of Dialectal Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Kathrein Abu Kwaik, Motaz Saad, Stergios Chatzikyriakidis,
and Simon Dobnik
xii Contents

Sentiment Analysis of Code-Switched Tunisian Dialect: Exploring


RNN-Based Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Mohamed Amine Jerbi, Hadhemi Achour, and Emna Souissi

Modeling Modern Standard Arabic

Building an Extractive Arabic Text Summarization Using


a Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Said Moulay Lakhdar and Mohamed Amine Chéragui

On Arabic Stop-Words: A Comprehensive List and a Dedicated


Morphological Analyzer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Driss Namly, Karim Bouzoubaa, Rachida Tajmout, and Ali Laadimi

Negation in Standard Arabic Revisited: A Corpus-Based


Metaoperational Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Mohamed-Habib Kahlaoui

Skew Correction and Text Line Extraction of Arabic Historical Documents . . . 181
Abdelhay Zoizou, Arsalane Zarghili, and Ilham Chaker

Authorship Attribution of Arabic Articles. . . . . . . . . . . . . . . . . . . . . . . . . . 194


Maha Hajja, Ahmad Yahya, and Adnan Yahya

Keyphrase Extraction from Modern Standard Arabic Texts Based


on Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Mourad Loukam, Djamila Hammouche, Freha Mezzoudj,
and Fatma Zohra Belkredim

HPSG Grammar Supporting Arabic Preference Nouns


and Its TDL Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Samia Ben Ismail, Sirine Boukédi, and Kais Haddar

Resources: Analysis, Disambiguation and Evaluation

“AlkhalilDWS”: An Arabic Dictionary Writing System Rich


in Lexical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Mohammed Reqqass, Abdelhak Lakhouaja, and Mohamad Bebah

T-HSAB: A Tunisian Hate Speech and Abusive Dataset . . . . . . . . . . . . . . . 251


Hatem Haddad, Hala Mulki, and Asma Oueslati

Towards Automatic Normalization of the Moroccan Dialectal Arabic


User Generated Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Ridouane Tachicart and Karim Bouzoubaa
Contents xiii

Arabic Search Results Disambiguation: A Set of Benchmarks. . . . . . . . . . . . 276


Haytham Salhi, Radi Jarrar, and Adnan Yahya

An Arabic Corpus of Fake News: Collection, Analysis and Classification . . . 292


Maysoon Alkhair, Karima Meftouh, Kamel Smaïli, and Nouha Othman

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303


Arabic Dialects and Sentiment Analysis
Generating a Lexicon for the Hijazi
Dialect in Arabic

Fatimah Abdullah Alqahtani1,2(&) and Mark Sanderson1


1
Computer Science, School of Science, RMIT University, Melbourne, Australia
{fatimah.alqahtani,mark.sanderson}@rmit.edu.au
2
College of Computer Science and Information Systems,
Jazan University, Jazan, Kingdom of Saudi Arabia

Abstract. We present a methodology for creating a lexicon for a low-resource


Arabic dialect in Saudi Arabia: Hijazi. We show the differences between the
Hijazi dialect and Modern Standard Arabic. We annotate articles and tweets
using recruited native speakers. We create a lexicon of Hijazi adapted from two
resources: Sebawai and Quranic Arabic Corpus. The lexicon is created both
manually and automatically by using Hijazi morphology. We detail the
methodology to build this lexicon and present results of an evaluation of the
corpus formation process.

Keywords: Hijazi dialect  Lexicon generation

1 Introduction

Arabic dialects are a set of linguistic characteristics that belong to a particular envi-
ronment [1], and are often used in informal daily communication. An increased
awareness of the existence and functioning of these dialects has emerged due to the
influence of social media, where these dialects are now written.
The Egyptian, Levantine, and Moroccan dialects as well as MSA are considered
high-resource; however, others, including the Hijazi Dialect are low-resource [2]. The
lack of resources has created an obstacle for researching Hijazi. A dialect of western
cities (Makkah, Madinah, Jeddah and Taif) in Saudi Arabia, Hijazi is spoken in the
second most populous region1. Hijazi has two varieties: urban and rural, and this study
focuses on the urban variety. To the best of our knowledge, no one has built resources
for Hijazi.
We applied a methodology to create a Hijazi lexicon in which potential Hijazi
words were annotated through a comparison with an MSA lexicon. Then, the Hijazi
words were analyzed using an approach employed by Darwish, Sajjad and Mubarak [3]
for Egyptian dialects to automatically generate an expanded Hijazi word list. We also
annotated 3,000 tweets to identify Hijazi content. Our work addresses the following
research question: Can a methodology used to create a High-resource Egyptian lexicon
be adapted to create a Low-resource Hijazi lexicon?

1
10,090,256 people in 2015 - http://www.cdsi.gov.sa.

© Springer Nature Switzerland AG 2019


K. Smaïli (Ed.): ICALP 2019, CCIS 1108, pp. 3–17, 2019.
https://doi.org/10.1007/978-3-030-32959-4_1
4 F. A. Alqahtani and M. Sanderson

Section 2 of the research reviews relevant work, Sect. 3 shows the approach use to
build Hijazi lexicon. In Sect. 4 Standard and Hijazi Arabic are compared. The approach
to generating the Hijazi dialect is presented in Sect. 5. Section 6 describes our eval-
uation. The paper concludes and gives insight into future work in Sect. 7.

2 Related Work
2.1 Creating Resources in Arabic Dialects
Prior Arabic Dialect corpus building work focused on building monolingual or parallel
corpora. Methods vary in the building with most using recruited native speakers.
The COLABA project collected resources from Arabic blogs for four dialects:
Egyptian, Iraqi, Levantine, and Moroccan [4]. For harvesting, Diab, Habash, Rambow,
Altantawy and Benajiba [4] asked 25 native speakers to generate 40 dialectal queries
containing words with multiple orthographies that cover social issues, religion, and
politics. The authors asked annotators to translate the queries to MSA and English. The
queries were used to extract matching blog data from the web. The researchers created
a tool to process and manage the data harvested from the blog.
Harrat, Meftouh and Smaili [5] created a parallel corpus for MSA, Algiers, Annaba,
Tunisian, Palestinian, and Syrian. They collected around 2.5K Algiers dialect sentences
from transcribed films and TV shows, which were then translated to MSA and Annaba
by a native speaker. In the same way, the researchers collected the corpus of Annaba
dialect for approximately 3.9K sentences from the transcribed recordings of the daily
life of some people of Annaba, which were then translated to MSA and Algiers.
Finally, they translated a whole collection of MSA around 6.4K sentences to Tunisian,
Palestinian and Syrian by native translators. The Dialect and MSA translation were
conducted by twenty-five persons in total for free.
The Gumar Corpus is a large-scale collection of Gulf Arabic consisting of 100
million words from over 1,200 novels published online [6]. The researchers annotated
the corpus manually into a sub-dialect of Gulf Arabic, which includes the Saudi, UAE,
Bahraini, Qatari, and Omani dialects. Annotations were at the document level. They
found that names given to the characters, cities, and event names in the novel helped
determine the dialect. The researchers extracted features and rules for these dialects to
understand the morphology and to build tools.
Most recently, there is the Curras corpus for the Palestinian dialect [7] that was
manually annotated by two annotators for one year. The researchers of this study
identified 56,700 tokens in 190 documents compiled from resources such as Facebook,
Twitter, blogs, forums, Palestinian stories, Palestinian terms, and scripts from Pales-
tinian TV shows. The researchers used the DIWAN Dialect Word Annotation tool [8]
and the MADAMIRA [9] morphological analyzer tool for MSA and Egyptian.
A quantitative evaluation was performed for three of the documents, which consist
1,529 tokens by two annotators, who met to review and discuss their annotations. Their
agreement was measured by Kappa, and the outcome was almost perfect desirable for
different tags (e.g., POS, stem, prefix, etc.).
Generating a Lexicon for the Hijazi Dialect in Arabic 5

Darwish, Sajjad and Mubarak [3] employed manual and automatic approaches to
collect three lexicons of Egyptian Dialect. In the manual approach, they asked a linguist
to extract 1,300 high frequency Egyptian words (MAN) from the Egyptian side of the
LDC2012T09 corpus [10] while in the automatic approach, they applied Egyptian
morphology rules to generate verbs from Sebawai Arabic roots [11]. The rules added
prefixes and suffixes such as pronouns and negation that are compatible with the
Egyptian dialect. The rules also substituted letters to change a word to the Egyptian
dialect. Filters were applied to the verb and letter substitutions to remove words that
were MSA. An MSA word list was drawn from 63 million Arabic tweets and Aljazirah
articles.
Mubarak and Darwish [12] presented a multi-dialect corpus from Twitter for Saudi,
Egyptian, Iraqi, Lebanese, Syrian, and Algerian. They collected 92 million tweets,
which had a user location. The user location of tweets was assigned to one of the Arab
countries in GeoNames, which indicate the location in each country. Then, the
researchers manually reviewed the location, which they mapped with GeoNames. Also,
they manually tried to map locations, which were non-matching with GeoNames. Also,
they collected all n-gram words that occurred at least three times in AOC, Aljazeera
interview articles, and the GigaWord corpus, which is a text archive of Arabic news
sources by Linguistic Data Consortium LDC. These n-gram words were manually
labelled by a native Arabic speaker, knowledgeable in different dialects to specify if it
was a dialect word and to which dialect it belonged. This resulted in around 2,500
dialect words. The researchers filtered the tweets as dialect tweets by the n-gram dialect
words and they got 6.5 million dialect tweets based on the following assumption “if a
sentence contained one of these n-grams, then the sentence is dialectal”. They used
crowdsourcing to evaluate 100 randomly extracted tweets per dialect. They asked
crowdsourcing workers, who were from the same countries from which the tweet was
issued, to judge whether the tweet dialect coincides in their country. They were not able
to get judges for Qatar and Bahrain.
Overall, the reviewed literature shows that many contributions have been made to
Egyptian, Levantine and North Africa dialects, also, sub-dialects of Levantine such as
Syrian, Palestinian, Jordanian and Lebanese.

2.2 Creating Resources in Low-Resource Languages


Different means of collecting and generating data for low-resource languages have been
tried. In 2011, Outahajala, Zenkouar and Rosso [13] manually built a corpus for the
Amazighe language: a low-resource language in Morocco, Algeria, Tunisia, Libya, and
areas of Egypt. They extracted text from various sources such as the Royal Institute for
Amazighe Culture’s newsletter and website as well as three primary school textbooks.
The corpus was manually annotated by a team of four annotators. It consisted of
different POS features to the tokenized Amazighe texts. Three linguists chose random
texts and evaluated them. The annotator agreement was 94.89%.
By drawing on the concept of manual tagging, Ramrakhiyani and Majumder [14]
provided a corpus of temporal expression recognition in Hindi called ILTIMEX2012.
There were three temporal expression classes: data time, a time or duration expression;
a frequency, which is a date or time expression; or period, which is a frequency
expression. The corpus is composed of 300 documents of a set of articles from the
6 F. A. Alqahtani and M. Sanderson

Hindi newspaper, The FIRE 2011 Hindi corpus [15]. Each document has more than
500 words. ILTIMEX2012 was manually labelled by using the General Architecture
for Text Engineering (GATE) tool annotation module. 514 periods, 110 frequency, and
1295 date-time for temporal expressions were included in the corpus. This corpus was
used for Hindi temporal expressions identification and classification.
Bird, Gawne, Gelbart and McAlister [16] applied an Android application, which
supports recording of speech directly [17]. When finishing a recording, users were
asked to add metadata such as name, language, and image. Users could segment the
audio and write a transcription and translation. The researchers used the application for
collecting audio from Brazil and Nepal. They collected 100k words from 10 h of audio.
A challenge with this approach is lack of the participation and scarcity of electricity in
the village which is important to charge the device. It was noted in the study that the
collection of data for these low-resources languages appeared firstly by collecting
speech, which shows that there is no written source for these languages. Also, the
crowdsourcing and human approaches consider the conventional method for collecting
and building the data for low-resource languages.
For Hijazi, Alahmadi [18] collected more than 30 Hijazi words by asking native
Hijazi speakers to give the correct dialect word for an image.

3 Approach

In defining the methodology for our problem of building a Hijazi corpus, we consid-
ered our situation. There are no Hijazi language resources available and only a limited
amount of edited Hijazi text is available online. However, there is a notable amount of
social media content written in Hijazi. Therefore, we examined an approach to building
a corpus using a combination of manual and automatic techniques that is adapted from
an approach by Darwish, Sajjad and Mubarak [3]. The approach requires access to an
initial Hijazi word list, a set of morphological rules, and a list of Arabic word roots.
There are two phases of the methodology: corpus creation and evaluation.

Table 1. The 10 most frequent words


Generating a Lexicon for the Hijazi Dialect in Arabic 7

We used different resource to build the Hijazi list: articles, dictionaries and building
from roots. We found a collection of Hijazi articles in Okaz news2 which is read in
western Saudi Arabia; 156 Hijazi articles were collected manually from February 2011
to September 2014. The set contained 59,225 tokens. An analysis of the most frequent
words found most were Hijazi pronouns, as shown in Table 1.
The overview of our process is to pre-process the list of Hijazi words (see Fig. 1
(a)), which involves removing MSA stop words. Next, we check if each remaining
word matches to a set of Hijazi verbs rules (Sect. 4.2), if there is a match, the word is
considered Hijazi and is added to the MANHijazi list. If there is not a match, the words
are compared with three MSA dictionaries: Maajim3, Alwased,4 and Alsahah5, if the
word is not found in these dictionaries, then the words is also considered Hijazi and is
added to the MANHijazi list.

Fig. 1. (a) Creating a manually formed Hijazi list (MANHijazi) from articles by comparing with
MSA dictionaries and Hijazi rules, (b) Creating three different list of Hijazi then filter.

Figure 1(b) illustrates the steps of the automatic approach to adding to the list in
MANHijazi. First, a set of roots drawn from the two sources: 10,406 from the Sebawai
system [11] and 943 verb roots from the Quran. Each root was transformed into
multiple verbs by applying a set of Hijazi verbs rules to form two lists of words:

2
http://www.okaz.com.sa.
3
http://www.maajim.com/dictionary/.
4
http://shamela.ws/browse.php/book-7028.
5
http://www.almaany.com.
8 F. A. Alqahtani and M. Sanderson

HijaziVQuraan, and HijaziVSebawai. We used the Quran root because it has a real verb
root while Sebawai has a large number of roots that are automatically generated. The
coverage of this latter set is greater, but it is noisy because of the roots are extracted
from different words such as the verbs, nouns and adjectives. In addition, MSA to
Hijazi letter substitution rules were applied to a list of MSA words from the Arabic side
of the English/Arabic parallel corpus from the International Workshop on Spoken
Language Translation (Arabic/IWSLT) 2013 to generate the list HijaziSUB1.
In the next step, the words in the three lists were compared to a list of three million
words drawn from Arabic tweets, which were pre-processed using the AraNLP tool
[19]. Words that matched were assumed to be some form of Arabic. In the final step,
the remaining words in the three lists were compared against a corpus of MSA words
OSAC6 [20], which contains 18,183,511 tokens. Those words that did not match
against the corpus were assumed to be Hijazi. We evaluated the lists of words
intrinsically by manually investigating a random sample of words drawn from each of
the lists.

4 Hijazi Dialect

We describe the basic syntax of the Urban Hijazi Dialect by contrasting it with MSA.
We focus on verbs and letter substitutions because these are where the main differences
between MSA and Hijazi are found. We examine irregular structures, such as lexical
and independent pronouns, and regular structures such as verb and letter substitution.

4.1 Irregular

Lexical. Three past papers – [21–23] – described three differences between the lexicon
words in MSA and Hijazi.
1. Lexicon words in Hijazi which have the same meaning in MSA but with different
letters. For example, (Bridge, ‫‘ ﻛﺒﺮﻱ‬kbry’ in Hijazi, ‫‘ ﺟﺴﺮ‬jsr’ in MSA), (Maybe, ‫ﺑﻠﻜﻦ‬
‘blkn’ in Hijazi, ‫‘ ﺍﺣﺘﻤﺎﻝ‬AHtmAl’ in MSA) and (Good, ‫‘ ﻛﻮﻳﺲ‬kwys’ in Hijazi, ‫ﻃﻴﺐ‬
‘Tyb’ in MSA).
2. Distinguishing dialectal terms in Hijazi which come from the combination of two
words, which are called a blend in linguistics. For example, ‫‘ ﺩﺣﻴﻦ‬dHyn’ (now),
comes from the words ‫*‘ ﺍﻟﺤﻴﻦ ﺫﺍ‬A AlHyn’; and ‫‘ ﻛﻤﺎﻥ‬kmAn’ (also), it comes from
the words ‫‘ ﻛﻤﺎ ﺇﻥ‬kmA <n’
3. Words in Hijazi and MSA which are written the same word but have different
meanings. For example, ‫‘ ﺩﻭﻝ‬dwl’, which means ‘those’ in Hijazi and ‘countries’ in
MSA.

Independent Pronouns. Independent pronouns in Hijazi are distinct from those in


MSA. The pronoun for “we” has more than one form. For example, ‘AHnA’ ‫ ﺍﺣﻨﺎ‬or

6
https://sites.google.com/site/motazsite/arabic/osac.
Generating a Lexicon for the Hijazi Dialect in Arabic 9

‘nHnA’ ‫ ﻧﺤﻨﺎ‬in Hijazi, but it is ‘nHn’ ‫ ﻧﺤﻦ‬in MSA. Also, the plural person pronoun is
‘AntW’‫ ﺍﻧﺘﻮ‬in Hijazi, not antum ‫ ﺃﻧﺘﻢ‬as in MSA. The first person ‘Anti’ ‫ ﺍﻧ ِﺖ‬can be written
in another form ‘Anty’ ‫ﺍﻧﺘﻲ‬. However, dual pronouns ‘ntmA’ ‫ ﺃﻧﺘﻤﺎ‬and ‘hmA’ ‫ ﻫﻤﺎ‬and the
feminine plural pronouns ‘Antn’ ‫ ﺍﻧﺘﻦ‬and ‘hn’ ‫ ﻫﻦ‬are not used in Hijazi.
The demonstrative pronouns in Hijazi are different from MSA. For the singular,
“this” is ‘dA’ ‫ ﺩﺍ‬for masculine and ‘dy’ ‫ ﺩﻱ‬for feminine. While in MSA, ‘h*A’ ‫ ﻫﺬﺍ‬and
‘h*h’ ‫ ﻫﺬﻩ‬are used for masculine and feminine respectively. Also, Hijazi speakers use
‘hAdA’ ‫ ﻫﺎﺩﺍ‬for masculine, while ‘hAdy’ ‫ ﻫﺎﺩﻱ‬for feminine, feminine plural nouns and
for inanimate masculine plural nouns. For plural “these”, they used ‘hdWl’ ‫ ﻫﺪﻭﻝ‬or
‘hdWlA ‫ ﻫﺪﻭﻻ‬or ‘dWl’ ‫ ﺩﻭﻝ‬or ‘dWlA’ ‫ ﺩﻭﻻ‬instead of ‘h&lA‘’ ‫ ﻫﺆﻻﺀ‬in MSA.

4.2 Regular
We present the main features of the Hijazi in negation, verb, and letter substitution.
Negation. The ‘mA-’ ‫ ﻣﺎ‬prefix is the only particle used for negation in all tenses of
Hijazi verbs. However, in MSA, ‘lA’ ‫ﻻ‬, ‘lm’ ‫ﻟﻢ‬, ‘ln’ ‫ﻟﻦ‬, ‘lys’ ‫ ﻟﻴﺲ‬are used to negate a
verb in addition to ‘mA-’ ‫ﻣﺎ‬. Also, the ‘mA’ ‫ ﻣﺎ‬prefix is used for the negative pronouns
in Hijazi instead of ‘lys’ ‫ ﻟﻴﺲ‬in MSA. In negating adjectives and nouns, the word ‘mw’
‫ ﻣﻮ‬can be used while in MSA ‘lA’ ‫ﻻ‬, and ‘lys’ ‫ ﻟﻴﺲ‬are used [21].
The Verb. We consider verbs from two points: tense and structure. There are three
main tenses in MSA: past, present, and future. However, [21–23]showed that some
verb forms differ between MSA and Hijazi:
• Hijazi adds a verbal particle as a prefix / bi-/‫ ﺏ‬+ verb (e.g. /bi-yiktub/ ‫‘ ﺑﻴﻜﺘﺐ‬byktb’
he writes) in the expression of present tense, but MSA does not have this verbal
particle.
• Hijazi adds a verbal particle as a prefix /in-/‫ ﺇﻧـ‬or ‫ ﺍﻧـ‬+ verb (e.g. /inktub/‫‘ ﺍﻧﻜﺘﺐ‬Anktb’
was written), although some speakers add the prefix /At/‫‘ ﺍﺗـ‬At’ + verb (e.g. /Atktab/
‫‘ ﺍﺗﻜﺘﺐ‬Atktb’ was written) in the expression of passive past tense, but MSA does not
have this verbal particle.
• Hijazi adds a verbal particle as a prefix /ħa-/‫‘ ﺡ‬H’ +verb (e.g. /ħa-yiguul/‫ﺣﻴﻘﻮﻝ‬
‘Hyqwl’ he’ll say) in the expression of future tense. Or Add the word/raħ/
‘rAH’ + verb (e.g./raħ-yiguul/ ‘rAH yqwl’ he’ll say). Instead of the word
‫‘ ﺳﻮﻑ‬swf’ ‘will’, which is add before the verb in MSA, or the letter ‫‘ﺱ‬s’, which is
add in prefix of verb in MSA.
There are two types of verbs across MSA and Hijazi: sound (‫ ﺻﺤﻴﺢ‬SHyH) or weak
(‫ ﻣﻌﺘﻞ‬mEtI):
• Sound verbs are those verbs that do not include (w) ‫ ﻭ‬or (y) ‫ ﻱ‬in the root letters. In the
past verb of singular feminine, Hijazi adds the letter to the end of a word such
as the word ‫‘ ﻛﺘﺒﺖ‬ktbti’ (she wrote) to be ‫‘ ﻛﺘﺒﺘﻲ‬ktbty’ in Hijazi. Furthermore, there are
double-sound verbs (‫ ﺍﻟﻔﻌﻞ ﺍﻟﻤﻀﻌﻒ‬AlmuDEf), where the second and third letter of the
root are the same, such as ‫ ﺩﻕ‬daqqa -‫ ﻳﺪﻕ‬yadiqqu (to knock). In the past verb of
singular, Hijazi removes the third letter to be ‫‘ ﺩﻗﻴﺖ‬dqyt’ (knocked) and add the suffix
‫‘ ﻳﺖ‬yt’ in You (masculine), while ‫‘ ﺩﻗﻘﺖ‬dqqtu’ in MSA with adding suffix ‫‘ﺕ‬t’.
10 F. A. Alqahtani and M. Sanderson

Also, there are (‫ ﺍﻟﻔﻌﻞ ﺍﻟﻤﻬﻤﻮﺯ‬mahmuuz) Hamzated verbs, where ‫ ﺀ‬is one of the
consonants, such as ‫>‘ ﺃﻛﻞ‬kal’ - ‫‘ ﻳﺄﻛﻞ‬y>kl’ (to eat). In general, there is no difference
between masculine and feminine in Hijazi for all tenses of verbs. In Hijazi, the
formulas for the masculine and feminine are the same in the plural form.
• Weak verbs are those verbs that contain‫( ﻭ‬w) or ‫( ﻱ‬y), as one or two of the root
letters. There are three types of weak verbs, Hijazi weak verbs have different
patterns among themselves and with MSA:
• Assimilated verbs (‫)ﺍﻟﻔﻌﻞ ﺍﻟﻤﺜﺎﻝ‬, which begin with‫‘ ﻭ‬w’ or ‫‘ ﻱ‬y’ such as ‫‘ ﻭﻗﻒ‬wqf’
‫‘ ﻳﻘﻒ‬yqf’, ‘to stand up’. In the present form, MSA removes the letter ‫‘ ﻭ‬w’ and
replaces it with ‫‘ ﺍ‬A’ like ‫‘ ﺍﻗﻒ‬Aqf’, while Hijazi keeps the letter ‫‘ ﻭ‬w’ and adds
the prefix ‫‘ ﺏ‬b’ like ‫‘ ﺑﺎﻭﻗﻒ‬bAwqf’ (will stand).
• Hollow verbs (‫)ﺍﻟﻔﻌﻞ ﺍﻷﺟﻮﻑ‬, which are the second letter in the root is ‫‘ ﺍ‬A’, ‫‘ ﻭ‬w’
or ‫‘ ﻱ‬y’, such as ‫‘ ﻗﺎﻡ‬qAm’ ‫‘ ﻳﻘﻮﻡ‬yqwm’, ‘to get up’. ‫‘ ﺍ‬A’ is replaced with ‫‘ ﻭ‬w’ in
the present tense. Also, ‫‘ ﺍ‬A’ is replaced with ‫‘ ﻱ‬y’ in the present tense, such as
‫‘ ﺑﺎﻉ‬bAE’ ‫‘ ﻳﺒﻴﻊ‬ybyE’, ‘to sell’.
• Defective verbs (‫)ﺍﻟﻔﻌﻞ ﺍﻟﻨﺎﻗﺺ‬, which end the root of the verb with‫‘ ﻭ‬w’ or ‫‘ ﻱ‬y’
such as‫‘ ﺭﻣﻰ‬rmY’ ‫‘ ﻳﺮﻣﻲ‬yrmy’, ‘to throw’. ‘Y’ is replaced by ‘y’ or ‫‘ ﺍ‬A’ is
replaced by ‘w’, as in the example ‫‘ ﻧﻤﺎ‬nmA’ ‫‘ ﻳﻨﻤﻮ‬ynmw’ ‘to grow’.

Letter Substitution. Hijazi people write in a way that reflects their pronunciation.
This leads to some letter substitutions between MSA and Hijazi [21–23]. Example
substitutions as shown in Table 2.

Table 2. The letter substitutions between MSA and Hijazi

5 Experiment

This section firstly presents the process used to collect our Hijazi dialect corpus. We
collected data from two different domains: articles and tweets. Each domain required
different approaches to annotation, as described below. Also, this section shows the
method used to generate the lexicon of Hijazi dialect verbs automatically.
Generating a Lexicon for the Hijazi Dialect in Arabic 11

5.1 Articles
To obtain a word list, we split the gathered articles into tokens by using whitespace and
other punctuation characters (“.,?!”) as delimiters. In informal text, words are not
always split by whitespace, so the letter ‫“ ﻭ‬w” “and”, was also used as a delimiter.
Next, we remove Arabic stop-words by using a list from the Ranks NL [24]. The
remaining words were confirmed as Hijazi by:
1. Checking manually if the word is a verb, then checking if it fits the Hijazi verb
structure described in Sect. 4. If yes, then it is a Hijazi verb.
2. If the word is not a verb or the name of a person or a place, then a search for the
word in three extensive MSA lexicons was conducted: Maajim, Alwased and
Alsahah. If the word was in one of the three, then it was considered an MSA word,
otherwise, it was considered a Hijazi word.
This leaves us with 1,363 MANHijazi words. A manual examination of the words
revealed that the distribution of POS were 904 verbs (simple present - passive past -
future), which common use the verb form and , 62 pronouns, prepositions, 56
adverbs, 82 adjectives, 18 question, 12 interjections, 6 phrases and 202 nouns. The
highest number of verbs demonstrates that the main difference between Hijazi and
MSA is in verbs.

5.2 Tweets
To label tweets, we used Hijazi native speakers. Three thousand tweets, which were
geo-located in the western cities of Saudi Arabia, Jeddah, Makkah, Taif, Medina and
Yanbu were collected from March to May 2014. The tweets were pre-processed
through manual inspection. All URLs, embedded images and user-related information
(i.e. display name, avatar/display image and user mention) were removed from the
tweets and their content was checked to ensure none were offensive.
Native Speakers. We obtained 3000 tweets from Mourad, Scholer and Sanderson [25]
that had locations in the western cities of Saudi Arabia: Jeddah, Makkah, Taif, Medina,
and Yanbu. Three native Hijazi speakers were recruited to annotate the 3000 tweets.
We used a questionnaire, which had the list of tweets, to ask the speakers to label the
tweets. The speakers were asked to record the tweet’s dialect type (Hijazi or non-
Hijazi) and details of which Hijazi words indicated that the tweet belonged to the
dialect. The speakers were given one week for the task.
Fleiss’ Kappa measured an annotator agreement of 0.89. From the annotation, we
found that there were 372 Hijazi tweets from the original 3000, Table 3 shows the
number of tweets from each city. There were 666 words in the Hijazi tweets and 311
unique word forms. Statistical comparisons between the Hijazi articles and tweets are
shown in Table 4. We can see that this approach to labeling Hijazi tweets generated a
limited lexicon, and therefore an automatic approach is also needed to extend the
lexicon Hijazi.
12 F. A. Alqahtani and M. Sanderson

Table 3. Identifying Hijazi and non-Hijazi dialect tweets by cities

City Hijazi Non-Hijazi Total


Jeddah 200 1277 1477
Makkah 69 520 589
Taif 45 331 376
Medina 42 377 419
Yanbu 16 123 139
Total 372 2628 3000

Table 4. Description of Hijazi articles and tweets

Features Articles Tweets


Number of 156 372
words 59223 4672
Unique words 16270 2872
Average sentence length 96.3 10
Short words (<=3 characters) 17537 1555
Long words (>=7 characters)) 9602 424
Unique Hijazi words 1367 35

5.3 Automatic Generation of Hijazi Dialect Words


The methodology used to expand the Hijazi lexicon verbs, follows a two-stage
approach from Darwish, Sajjad and Mubarak [3]:
3. Generating a lexicon by using morphological rules of the dialect combined with
roots (Quranic verbs root and Sebawai roots). The rules will generate multiple verbs
from a single root.
4. Filtering the lexicon.

Automatic Generating Word. To generate Hijazi verbs from each root, prefixes were
added to the root to set a tense (present, future, present/future passive). In Hijazi, object
pronouns are attached to the verb as suffixes. A suffix set of Hijazi dialect are shown in
Table 5. Also, there are ten subject pronouns: I “ ”, you “ ”, you “ ”, you “ ”,
you “ ”, we “ ”, they “ ”, they “ ”, he “ ”, she “ ”, and each pronoun has
associate suffixes from the suffixes set.

Table 5. Suffixes set in Hijazi Dialect


Generating a Lexicon for the Hijazi Dialect in Arabic 13

The rules have the following form: Sign, Root_Length, Condition, Prefix, Suffix
• Sign: contains (= ,>) to compare with the root length.
• Root_Length: the length of the root in the rule to apply the rule condition. Note that
the first letter begins with index 0.
• Condition(s): it has three main sections:
• index: contains the index number to verify, it can provide a specific index, a
range of index or all indexes in the root.
• verify: contains a letter or collection of letters to check if the letter is in the given
index.
• action: if the condition of verifying achieved then apply the action, which can be
R to remove a specific letter in the index or change to a letter or collection of
letters, which leads to generate multi-root.
• Prefix: the prefixes are added at the beginning of the root. If multiple prefixes are
listed, then from single root multi verbs will be generated.
• Suffix: the suffixes are added at the end of the verb, which generated from the Prefix
stage, by attaching appropriate suffixes as appropriate for the pronouns to generate
Hijazi, (HijaziVQuraan) and (HijaziVSebawai).
After applying the rules, the total number of Hijazi words generated from the two
roots sets, Sebawai and Quraan, are 1,585,461 and 184,227 respectively. Example of
the generation process are shown in Table 6.
For letter substitution, we used an MSA list of words from an English/Arabic
parallel corpus7, which consists of a dataset of around 150 k sentences. We apply
substitution rules (from Sect. 4.2) if there is a letter in any word of MSA list match in
Hijazi letter substitution then changed it to the appropriate letter in the Hijazi. We have
obtained in 3,800 letter substitutions in Hijazi Arabic (HijaziSUB).
Filtering the Generating Word. The lists of Hijazi words, (HijaziVQuraan, Hija-
ziVSebawai, and HijaziSUB) were filtered by using steps 2 and 3 as shown in Fig. 1(b).
The purpose of step 2 was to remove ambiguous or error words produced by one of the
automatic generation methods. Also, this technique had a disadvantage; there might
have been deletions of unused Hijazi words in tweets. We obtained 30,389, 25,599 and
1,630 words for HijaziVSebawai, HijaziVQuraan, and HijaziSUB respectively from this
step. Then, we applied the step 3 to ensure there were no MSA words in the lists. In the
end, the total number of HijaziVSebawai, HijaziVQuraan, and HijaziSUB from step 3 were
24,413, 12,428 and 1,074 respectively. The verbs intersection in HijaziVSebawai and
HijaziVQuraan are around 10,595 verb words, while there is no intersection between the
list of verbs (HijaziVSebawai and HijaziVQuraan) and HijaziSUB.

7
From the International Workshop on Arabic Language Translation.
14 F. A. Alqahtani and M. Sanderson

Table 6. An example illustrating the automatic generating Hijazi word

6 Evaluation

Comparative evaluation with past work is not available since to the best of our
knowledge this is the first study to generate a Hijazi lexicon. Instead, we used intrinsic
evaluation to show the accuracies of the three Hijazi lexicons. We manually investi-
gated randomly sampling a 2% from HijaziVSebawai, HijaziVQuraan, and HijaziSUB: 490,
250, and 22 words, respectively, to estimate the coverage of the accuracy of lists. The
error rate (ER) was calculated from the proportion of error words. As shown in the
Table 7, HijaziVQuraan has the highest number of Hijazi dialect words (227) with an
error rate of 0.09. In contrast, HijaziSUB has the highest error rate of 0.27 in 22 words
while the HijaziVSebawai was 0.16. An analysis of errors was conducted, see Table 7.
Three error types were found:
Generating a Lexicon for the Hijazi Dialect in Arabic 15

• Alternative root: the generated verb looks like a Hijazi word, but a Hijazi speaker
would use a different root.
• In HijaziVSebawai, alternative Hijazi root has the highest error rate of 69 error
words. For example, the word ‘AtEm$’, ‘weak eyesight’ from the word
‘Em$’, is used as the adjective and it is ‘AEmY’, ‘blind’, ‘DEf
nZrh’ as an adjective in Hijazi, or ‘AnEmY’ ‘to blind’ as the verb.
• In HijaziVQuraan, the highest number of such errors was 14. Examples include:
‘Hymyd’, “to shake from the root” where ‘myd’ is replaced with ‘hz’
to be ‘Hyhz’.
• Incorrect root: the generated verb looks like a Hijazi word, but the root is not an
MSA verb root. The incorrect root was due to errors in the automatically created
Sebawai list. An example of this is in Sebawai [11], ‫‘ ﻣﻠﻢ‬mlm’, ‘has knowledge’, the
root is a noun according to the dictionary Maajim not a verb root, has been applied
the Hijazi rule to become ‫‘ ﺣﺘﻤﻠﻢ‬Atmlm’ in HijaziVSebawai, which is not Hijazi verb.
• Error a rule: the rule should not have been applied to all words. In HijaziSUB, the
errors are due to the mistaken assignment for rules in word or root. For example, in
the word ‫‘ ﺍﻟﺮﺿﺎ‬AlrDa’ “satisfaction”, the letter ‫“ ﺽ‬D” changed to ‫‘ ﺯ‬z’ to become
‘Alrza’, which is not Hijazi words.

Table 7. The Distribution of Hijazi and non-Hijazi words in the evaluation

Type of list No. Sam- Hijazi Non- Er- Distribution error rate
words ple words in Hijazi ror Alter- In- Er-
words 2% words rate native cor- ror
2% in 2% Hijazi rect rule
root root
HijaziVSebawai 24,413 490 409 81 0.16 69 6 6
HijaziVQuraan 12,428 250 227 23 0.09 14 0 8
HijaziSUB 1,074 22 16 6 0.27 0 0 6

This technique of using the morphological rule of Hijazi is considered a good


starting point to generate the Hijazi automatically, where we started from 1,363 and
now we have 24,413, 12,428 and 1,074 for HijaziVSebawai, HijaziVQuraan, and HijaziSUB
respectively. The intrinsic manual evaluation for HijaziVSebawai and HijaziVQuraan give
a different type of answer: once has more words with high error rate while the other has
few words with low error rate.

7 Conclusion

In this paper we asked the following research question: can a methodology, used to
create a High-resource Egyptian, lexicon be adapted to create a Low-resource Hijazi
lexicon?
We explained a method for building a lexicon of a low-resource Arabic dialect in
Saudi Arabia: Hijazi, using human experts. We expanded a lexicon of Hijazi words by
16 F. A. Alqahtani and M. Sanderson

covering different Hijazi morphologies rules in a manual and in automatic approaches


that can be used in this research and the future as a benchmark. Morphological rules
were applied to two different root sets: Quranic and Sebawai roots. The Hijazi lexicon
generated can be used for computational linguistic research and NLP tools. This lex-
icon would help in some applications such as electronic translation.
We evaluated Hijazi lexicon manually. Based on the result of our experiments, we
found that the methodology can be applied to create a Hijazi lexicon. Also, we found
that the linguistic phenomena in the Hijazi are the first step that allows us to build a
Hijazi lexicon. For future work, we plan to expand the size of our corpus to maximize
the coverage of the domain of the Hijazi dialect lexicon and mapping with MSA. We
also plan to develop a morphological analyzer for the Hijazi dialect, and then build an
automatic classification to annotate the Hijazi dialect.

References
1. Anis, I.: On The Arabic Dialects. Maktabat al-Anglo al-Misriyya, Cairo (1952)
2. Biadsy, F., Hirschberg, J., Habash, N.: Spoken Arabic dialect identification using
phonotactic modeling. In: Proceedings of the EACL 2009 Workshop on Computational
Approaches to Semitic Languages 2009, pp. 53–61. Association for Computational
Linguistics (2009)
3. Darwish, K., Sajjad, H., Mubarak, H.: Verifiably effective Arabic dialect identification. In
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
(EMNLP), pp. 1465–1468 (2014)
4. Diab, M., Habash, N., Rambow, O., Altantawy, M., Benajiba, Y.: COLABA: Arabic dialect
annotation and processing. In: LREC Workshop on Semitic Language Processing, pp. 66–74
(2010)
5. Harrat, S., Meftouh, K., Smaili, K.: Creating parallel Arabic dialect corpus: pitfalls to avoid.
In: 18th International Conference on Computational Linguistics and Intelligent Text
Processing (CICLING) (2017)
6. Khalifa, S., Habash, N., Abdulrahim, D., and Hassan, S.: A large scale corpus of Gulf
Arabic. arXiv preprint arXiv:1609.02960 (2016)
7. Jarrar, M., Habash, N., Alrimawi, F., Akra, D., Zalmout, N.: Curras: an annotated corpus for
the Palestinian Arabic dialect. Lang. Resour. Eval. 51(3), 745–775 (2017)
8. Al-Shargi, F., Rambow, O.: DIWAN: a dialectal word annotation tool for Arabic. In:
Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 49–58
(2015)
9. Pasha, A., et al.: MADAMIRA: a fast, comprehensive tool for morphological analysis and
disambiguation of Arabic. In: LREC, vol. 14, pp. 1094–1101 (2014)
10. Zbib, R.,et al.: Machine translation of Arabic dialects
11. Darwish, K.: Building a shallow Arabic morphological analyzer in one day. In: Proceedings
of the ACL-02 Workshop on Computational Approaches to Semitic Languages (2002)
12. Mubarak, H., Darwish, K.: Using Twitter to collect a multi-dialectal corpus of Arabic. In:
Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing
(ANLP), pp. 1–7 (2014)
13. Outahajala, M., Zenkouar, L., Rosso, P.: Building an annotated corpus for Amazighe. In:
Will appear In: Proceedings of 4th International Conference on Amazigh and ICT (2011)
Generating a Lexicon for the Hijazi Dialect in Arabic 17

14. Ramrakhiyani, N., Majumder, P.: Approaches to temporal expression recognition in Hindi.
ACM Trans. Asian Low-Resour. Lang. Inf. Process. 14(1), 2 (2015)
15. Palchowdhury, S., Majumder, P., Pal, D., Bandyopadhyay, A., Mitra, M.: Overview of FIRE
2011. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.Venkata, Contractor,
D., Rosso, P. (eds.) FIRE 2010-2011. LNCS, vol. 7536, pp. 1–12. Springer, Heidelberg
(2013). https://doi.org/10.1007/978-3-642-40087-2_1
16. Bird, S., Gawne, L., Gelbart, K., McAlister, I.: Collecting bilingual audio in remote
indigenous communities. In: Proceedings of COLING 2014, the 25th International
Conference on Computational Linguistics: Technical Papers, pp. 1015–1024 (2014)
17. Hanke, F.R., Bird, S.: Large-scale text collection for unwritten languages. In: Proceedings of
the Sixth International Joint Conference on Natural Language Processing, pp. 1134–1138
(2013)
18. Alahmadi, S.D.: Loanwords in the urban Meccan Hijazi dialect: an analysis of lexical
variation according to speakers’ sex, age and education. Int. J. Engl. Linguist. 5(6), 34
(2015)
19. Althobaiti, M., Kruschwitz, U., Poesio, M.: AraNLP: a Java-based library for the processing
of Arabic text. In: Proceedings of the 9th Language Resources and Evaluation Conference
(LREC), Reykjavik, pp. 4134–4138 (2014)
20. Saad, M.K., Ashour, W.: OSAC: open source arabic corpora. In: Proceedings of the EEECS
2010 the 6th International Symposium on Electrical and Electronics Engineering and
Computer Science, pp. 118–123. European University of Lefke, Cyprus (2010)
21. Sieny, M.E.: The Syntax of Urban Hijazi Arabic. Librairie du Liban, Beirut (1973)
22. Omar, M.K.: Saudi Arabic, Urban Hijazi Dialect: Basic Course. Foreign Service Institute,
Washington, D.C. (1975)
23. Arabic Variant Identification Aid. http://terpconnect.umd.edu:80/*nlynn/AVIA/Level3/
index.htm. Accessed 06 May 2015
24. RANKS NL - Arabic stopword list. https://www.ranks.nl/stopwords/arabic. Accessed 02 Feb
2015
25. Mourad, A., Scholer, F., Sanderson, M.: Language influences on Tweeter geolocation. In:
Jose, Joemon M., Hauff, C., Altıngovde, I.S., Song, D., Albakour, D., Watt, S., Tait, J. (eds.)
ECIR 2017. LNCS, vol. 10193, pp. 331–342. Springer, Cham (2017). https://doi.org/10.
1007/978-3-319-56608-5_26
Natural Arabic Language Resources for
Emotion Recognition in Algerian Dialect

Habiba Dahmani1(B) , Hussein Hussein2 , Burkhard Meyer-Sickendiek2 ,


and Oliver Jokisch3
1
Department of Electrical Engineering,
University of Mohamed Boudiaf, M’Sila, Algeria
habiba.dahmani@univ-msila.dz
2
Department of Literary Studies, Free University of Berlin, Berlin, Germany
{hussein,bumesi}@zedat.fu-berlin.de
3
Institute of Communications Engineering,
Leipzig University of Telecommunications (HfTL), Leipzig, Germany
jokisch@hft-leipzig.de

Abstract. In this paper, we present and describe our first work to design
and build a natural Arabic visual-audio database for the computational
processing of emotions and affect in speech and language which will be
made available to the research community. It is high time to have sponta-
neous data representative of the Modern Standard Arabic (MSA) and its
dialects. The database consists of audio-visual recordings of some Arabic
TV talk shows. Our choice comes down on the different dialects with the
MSA. As a first step, we present a sample data of Algerian dialect. It
contains two hours of audio-visual recordings of the Algerian TV talk
show “Red line”. The data consists of 14 speakers with 1, 443 utterances
which are complete sentences. 15 emotions investigated with five that
are dominants: enthusiasm, admiration, disapproval, neutral, and joy.
The emotion corpus serves in classification experiments using a variety
of acoustic features extracted by openSMILE. Some algorithms of classifi-
cation are implemented with the WEKA toolkit. Low-level audio features
and the corresponding delta features are utilized. Statistical functionals
are applied to each of the features and delta features. The best classifi-
cation results - measured by a weighted average of f-measure - is 0.48 for
the five emotions.

Keywords: Emotion recognition · Arabic language resources ·


Algerian dialect

1 Introduction
Emotions are omnipresent whether we speak or not. It is a continuous state of
mind. Emotions are the mind of our verbal and non-verbal reactions. Without
emotions, our reactions are incomprehensible. Without even partial or mini-
mal presence of emotions, we are somehow sick or less intelligent as we should
c Springer Nature Switzerland AG 2019
K. Smaı̈li (Ed.): ICALP 2019, CCIS 1108, pp. 18–33, 2019.
https://doi.org/10.1007/978-3-030-32959-4_2
Another random document with
no related content on Scribd:
Gavito (Francisco), Rimas. Mejico, 1843.
Geiger (John L.), A Peep at Mexico. London, 1874.
Gelves (Marqués de), Protesto del Virrey á la audiencia. MS.
Genius of Liberty. Vera Cruz, 1847 et seq.
Gentry (M. P.), Speech on Mexican War, Dec. 16, 1846.
Washington, 1846.
Geyer (Otto Ferdinand), Panorama of Mexico. Mobile, 1835.
Gil (Francisco), Fundacion de la Obra Pia, que para el mayor
culto. Mexico, 1775. 4to.
Giles (W. F.), Speech in U. S. House of Rep. Washington, 1849.
Gilliam (Albert M.), Travels in Mexico. Philadelphia, 1846.
Gimenez (Manuel María), El Mérito Verdadero. n.pl., 1821.
Giordan (Francis), Description de l’Isthme de Tehuantepec. Paris,
1838; Réponse au libelle sur la Colonie du Goazacoalco. Paris,
1831.
Girard (Just), Excursion au Mexique. Tours, 1867.
Gobernador (El), del Departamento al Público de Mexico en las
quejas. Mexico, 1837.
Gobierno de la Iglesia, Dudas que se proponen. Mexico, 1826.
Gobierno Político de N. España. MS.
Godoy (Diego), Relacion á Hernando Cortés Mayo de 1524. In
Barcia, Hist. Prim., tom. i.
Godoy (José María), Discurso Pronunciado. Mexico, 1846.
Goggin, Speech in U. S. House of Rep., Feb. 1st, 1848. n.pl., n.d.
Goldschmidt (Albert), see Cartography of the Pacific Coast.
Gomara (Francisco Lopez), Crónica de la Nueua España con la
Conquista de Mexico y otras Cosas Notables. Saragossa.
[1554.]
Gomara (Francisco Lopez), Historias de las Conquistas de
Hernando Cortés. [Chimalpain edition.] Mexico, 1826. 2 vols.
Gomara (Francisco Lopez) Historia di Don Fernando Cortés.
Venetia, 1560.
Gomara (Francisco Lopez), Historia de Mexico. Anvers, 1554.
Gomara (Francisco Lopez), La Historia General de las Indias.
Anvers, 1554.
Gomez, Diario. In Doc. Hist. Mex., serie ii., tom. vii.
Gomez (José), Anales de Mexico 1776-1798. [Mex., 1832. MSS.];
Vida de la Madre Antonia de San Jacinto. Mex., 1689. 4to.
Gomez (Man.), Vindicacion del Primer Ayudante. Mexico, 1840.
Gomez de Avellaneda, Guatimozin, último Emperador de Méjico.
Méjico, 1853.
Gomez y Anaya (Cirilo), Defensa legal que hizo por Gen. Negrete.
Mexico, 1828. 4to.
Gonzalez, Relacion de la Marcha de la Brigada Gonzalez. Toluca,
1857.
Gonzalez (Estéban Diaz), Contestacion en derecho de los
conventos. Mexico, 1830.
Gonzalez (Fernando Alonso), Regla de N. S. P. S. Francisco.
Mexico, 1725. 4to.
Gonzalez (Joaquin), Anotaciones al Papel Titulado Informe.
Mexico, 1822.
Gonzalez (José Eleuterio), Coleccion de Noticias y Documentos
para la Hist. del estado de N. Leon. Monterey, 1867.
Gonzalez de Aragon (Francisco), Dictámen Presentado por el
Síndico sobre las Contratas de Limpia de Ciudad. Méjico, 1834.
Gonzalez y Avendaño (Franciscus), Parhelion Marianum, Mexici
Subirbijo. Mexico, 1757. 4to.
Gonzalez y Zúñiga (Anna Ma.), Florido Ramo que tributa en las
fiestas de Guadalupe. Mexico, 1748. 4to.
Gonzalo (Vict. Lopez), Obispo de Puebla. [Puebla, 1784.]
Gordoa (José Miguel), Reflecsiones que se hicieron por su autor á
consulta del Hon. Cong. de Zacatecas. Mexico, 1827.
Gordon (Thomas F.), The History of Ancient Mexico. Philadelphia,
1832. 2 vols.
Gordon (Thomas F.), Spanish Discoveries in America prior to
1520. Philadelphia, 1831. 2 vols.
Gorosito (Francisco de) Cartilla en Diálogos acerca de la
confesion. Mexico, 1703.
Gorostiza (M. E. de), A sus conciudadanos, ó breve reseña de las
operaciones del Ministro de Hacienda. Mexico, 1838. 4to.
Government and People of the United States and to those of
Spanish-America, n.p., n.d.
Grambila y Arriaga (Antonio), Tumultos de Mexico, n.p., n.d. MS.
folio.
Granados y Galvez (Joseph Joaquin), Tardes Americanas.
Mexico, 1778.
Grandeza y Excelencias de los siete príncipes de los Angeles.
Mexico, n.d. MS.
Grant (U. S.), Illustrated Life, Campaigns, and Public Services of.
Philadelphia, 1865.
Gratitud del Ayuntamiento Constitucional de la Villa de Coyoacan.
Mejico, 1820.
Gray (Albert Zabriskie), Mexico as it is. New York, 1878.
Gray (A. B.), Report and map of Mex. Boundary. Washington,
1853.
Green (George M.), Statement of his recollections of life in Mexico,
1853-55. MS.
Greene, Speech on Ten Regiment Bill in U. S. Sen., Feb. 18,
1848. Washington, 1848.
Gregory (Samuel), History of Mexico. Boston, 1847.
Grone (Carl von), Briefe über Nord-Amerika und Mexiko.
Braunschweig, 1850.
Guadalajara, Conducta observada por el Gobierno Eclesiástico.
Guadalajara, 1859. 4to; Discurso pronunciado. Mex., 1824;
Ecsámen Público. Guadal., 1841; Ecsposicion del Cabildo.
Guadal., 1824. folio; Espolios de los Sres Obispos 1759. MS.
folio; Explicaciones que el Mayordomo. Guadal., 1865. 4to;
Exposicion hecha por el Ayuntamiento. Guadal., 1844; Gaceta
de Gobierno. Guadal., 1821; Gobierno Eclesiástico. Guadal.,
1859; Representacion del Obispo sobre Cementerios. [Guad.,
1847]; Obispo de, Carta Pastoral. Guadal., 1859;
Observaciones que hace el V. Cabildo. Guadal., 1842.
Observaciones que sobre el projecto de Bases orgánicas hacen
el Obispo. n.pl. 1843; Préstamos, Contribuciones y Exacciones
de la Iglesia de Guadalajara. Guadal., 1847; Protesta del
Obispo y Cabildo de la Santa Iglesia de Guadalajara. Guadal.,
1847; Protesta del Obispo. Guadal., 1848; Real cédula de
ereccion del Consulado. Guad., 1795. folio; Relacion cristiana
de los males que ha sufrido. Guadal., 1811. 4to. Representacion
de la Junta de Fomento de Comercio. Guad., 1852. 4to. El
Tribuno, 1827. folio; Ultraje á las autoridades por los Canónigos.
Mex., 1825.
Guadalupe (Nuestra Sra de), Coleccion de obras y opúsculos
pertenecientes á la milagrosa aparicion. Madrid, 1785.
Guanajuato, Continuacion de las contestaciones entre el Goberdr
y Cabildo de Michoacan. Morelia, 1857. 4to; Constitucion
Política. Mex., 1826; Cuenta de la Fábrica de la Alhóndiga de
Granaditas. MS., 1809. folio; Esposicion que el Sup. Gobr.
Guan., 1844; Expediente Instruido sobre el Establecimiento de
un Presidio en Atargea. Guan., 1848; Informe que de su
administracion. Guan., 1826; Informe leido por Gobr del Estado.
Mex. [1853]; Memoria del Gobr. Mex. [1852]; Programa de las
Funciones. Guan., 1846; Pública Vindicacion del Ayuntamiento
de Santa Fé. Mex., 1811; Representacion que el Ayuntamiento.
Guan., 1840; Segundo Certámen jurídico y literario. Mex., 1852.
Guardia (J. M.), Les Républiques de l’Amérique Espagnole. Paris,
1862.
Guazacoalco, Colonie du, Dans L’État de Vera Cruz. Paris, 1829;
Notes pour servir d’instruction. Paris, n.d.
Güemes y Horcasitas (Juan Francisco), Banco de Mineros.
Mexico, 1747. 4to.
Guénot (Estévan), Proyecto de utilidad comun. Mexico, 1839. 4to.
Guerra (Diego), Memorial al Rey sobre Religiosos. [México, 1631.]
folio.
Guerra (José), Historia de la Revolucion de Nueva España.
Londres, 1813. 2 vols.
Guerra (Juan Alvarez), Modo de extinguir la Deuda Pública.
Mexico, 1814.
Guerra de España con Méjico, Segundo Artículo. Paris, 1857.
Guerra Eterna á este Congreso. [Mexico, 1833.]
Guerra entre Mexico y los Estados-Unidos, Apuntes para la
Historia de. Mexico, 1848.
Guerra á todo militar oficinista. Mexico, 1821.
Guerrero, Constitucion política del Jun. 1851. [Guerrero, 1851.] El
Estado de Guerrero en la Exposicion Nacional, 1870. Mexico,
1870; Memoria presentada á Legislatura. Chilpancingo, 1872;
Noticia extraord. de la Muerte. Mex., 1823. folio.
Guerrero (Isidoro), Exposicion Corta. Mexico, 1874.
Guerrero (José María), Dictámen Teológico contra el ensayo
sobre tolerancia religiosa. Mexico, 1831.
Guerrero (Mariano Soto), Proclama á favor de todos los buenos y
contra todos los malos. Mexico, 1812.
Guerrero (Vicente), A los ciudadanos militares. Mexico. [1821.]
folio. Expulsion of Spaniards. 4to. Ilustres habitantes de la gran
Mexico. [Mexico, 1822]. folio; Manifiesto á sus compatriotas.
Mexico, 1829; El Presidente de los Estados-Unidos Mex.,
Compatriotas. Mexico, 1829. El Presidente de la República.
Mex., 1829. folio; Proclamacion. Mex., 1822. folio; El Soberano
Estado de Oajaca. Oajaca, 1833. 4to; Sumaria Averiguacion.
Oaxaca, 1831. folio.
Guevara (Balthazar Ladron de), Manifiesto, que el Real Convento
de Religiosas. [Mex.] 1771. 4to.
Guevara (Juan), Proceso contra. MS.
Guevara (Miguel Tadeo de), Sumario de las Indulgencias. Mexico,
1787.
Guia de Hacienda de la Rep. Mex. año de 1827. [Mexico, 1827.]
Guia para el conocimiento de monedas y medidas. Mexico, 1825.
Guridi (José Miguel), Apología de la aparicion de Nuestra Señora
de Guadalupe. Mexico, 1820.
Guridi y Alcocer (José Miguel), Sermon predicado en accion de
gracias. Mexico, 1808.
Guridi y Alcocer (José Miguel), Exhortacion que para el Juramento
de la Constitucion en la Parroquia del Sagrario. Mexico, 1820.
Guridi y Aleva (José María), Ley, Justicia y Verdad. Mexico, 1828.
Gutierrez (Blas. J.), de Reforma. Mexico, 1868.
Gutierrez (José Ign.), Contestacion al libelo intitulado “Apuntes
para la Historia.” Mex., 1850; Documentos justicativos de la
conducta pública 1810-21. Mex., 1850; Vota á favor de los
primeros caudillos de la libertad Americana. [Mex.] 1822. 4to.
Gutierrez de Estrada (J. M.), Algunas observaciones sobre el
oficio que con fecha 22 de Julio. Mex., 1835; Carta al
Presidente sobre la necesidad de buscar el remedio. Mex.,
1840; Discurso pronunciado en el palacio de Miramar. Paris,
1863; Documentos relativos á la separacion de la 1a sec. de
Estado. Mex., 1835; Mejico y el Archiduque Fern. Max. Paris,
1862; Mejico en 1840 y en 1847. Paris, 1848; Mexico y el
Archiduque Maximiliano. Mex., 1863. 4to.
Gutierrez de Villanueva (José), Discurso 20 de Abril de 1834.
Mejico, 1836.
Guzman (José María), Breve y sencilla narracion del Viage.
Mexico, 1837.
Guzman (Leon), Cuatro Palabras sobre el asesinato del señor
General D. Juan Zuazua. Monterey, 1860.
Guzman (S. M. Gozalo de), Carta que escribe á 8 de Marzo de
1529. In Pacheco and Cárdenas, Col. Doc., tom. xiii.
H. (R. G.), Memoria sobre la Propiedad Eclesiástica. Mexico,
1864.
Hacienda, Real Decreto para el Establecimiento del General
sistema de. Mexico, 1817.
Hale (J. P.), Speech on increase of army in Mexico. Jan. 6, 1848.
Washington, 1848.
Hall (Basil), Voyage au Chili, au Pérou et au Méxique. 1820-22.
Paris, 1834. 2 vols.
Hall (Frederic), Invasion of Mexico by the French. New York, 1868.
Hall (Frederic), Life of Maximilian I. New York, 1868.
Hall (Wm. M.), Speech in favor of a National Railroad to the
Pacific, July 7, 1847. New York, 1853.
Halleck (H. W.), Mining Laws of Spain and Mexico. San Francisco,
1859.
Hamersley (L. R.), Records of living officers of U. S. Navy.
Philadelphia, 1870.
Hardy (Lieut. R. W. H.), Travels in the Interior of Mexico. London,
1829.
Haro (Benito), Memoria justificada de la conducta. Mexico, 1857.
Haro y Tamariz (Antonio de), A sus compatriotas. n.pl., 1856;
Esposicion que dirige á sus Conciudadanos y Opiniones del
autor sobre la monarquía constitucional. Mex., 1846; Estracto
del Espediente sobre deuda Esterior. Mex., 1846.
Hart (Charles), Remarks on Tabasco. Philadelphia, 1867.
Haven (Gilbert), Our Next-Door Neighbor. New York, 1875.
Haven (S. G.), Remarks on Ten Million Mexican Treaty Bill, June
27, 1854. Washington, 1854.
Hay (Guillermo), Apuntes geográficos, estadísticos é históricos del
distrito de Texcoco. Mexico, 1866.
Hayes (Benjamin), Mexican Laws. Notes on. MS.
Hazart (Cornelium), Kirchen-Geschichte. Wien, 1678-84. 2 vols,
folio.
Hefelé (Dr.), Le Cardinal Ximenès Franciscain et la situation de
l’Eglise en Espagne. Paris, 1856.
Heller (Carl B.), Mexico. Wien, 1864.
Heller (Carl B.), Reisen in Mexiko in den jahren 1845-8. Leipzig,
1853.
Helps (Arthur), The Conquerors of the New World and their
Bondsmen. London, 1848-52. 2 vols.
Helps (Arthur), Life of Hernando Cortés. New York, 1871. 2 vols.
Helps (Arthur), The Life of Las Casas. Philadelphia, 1868.
Helps (Arthur), The Spanish Conquest in America. London, 1855-
61. 4 vols.; also New York, 1856. 2 vols.
Henley (Thomas J.), The War with Mexico. Speech in U. S. House
of Rep., Jan. 26, 1848. Washington, 1848.
Henriquez (Martin), Instruccion 25 de Setiembre de 1580. In
Pacheco and Cárdenas, Col. Doc., tom. iii.
Henry (Capt. W. S.), Campaign Sketches of the War with Mexico.
New York, 1847.
Heraldo (El). Mexico, 1848 et seq.
Heredia (José M.), Miscelanea, Periódico Crítico y Literario.
Tlalpam, 1829; Poesía inédita. Mex., 1848; Poesías. Mex.,
1852; Sila, Tragedia en Cinco Actos. Mex., 1825.
Heredia y Sarmiento (Josef Ignacio), Oracion fúnebre que en las
solemnes exequias. Mexico, 1808.
Heredia y Sarmiento (Jos. Ign.), Sermon Panegírico de la gloriosa
aparicion de Nuestra Señora de Guadalupe 12 de Dic. de 1801.
MS. Mexico, 1803.
Hermosa (Jesus), Enciclopedia Popular Mejicana. Paris, 1857.
Hermosilla (J. G.), El Jacobinismo. Mejico, 1834. 3 vols.
Hernandez (Francisco), Nova Plantarvm Animalivm et Mineralivm
Mex. historia. Romæ, 1651. 4to.
Hernandez (Gregorio), Causa de Poligamia ante el Tribunal de la
Inquisicion en Mexico. MS., 1771-4. folio.
Hernandez (José Gerónimo), Grande Empresa de Minas. Mejico,
1861.
Hernandez (José María Perez), Compendio de la Geografía del
Estado de Michoacan de Ocampo. Mexico, 1872.
Hernandez (José María Perez), Estadística de la República
Mejicana. Guadalajara, 1862.
Hernandez y Dávalos (J. E.), Estado de Jalisco. In Soc. Mex.
Geog., Boletin, 2a. Ep. iii.
Herrera (Antonio de), Descripcion de las Indias Occidentales.
Madrid, 1730. folio.
Herrera (Antonio de), Historia General de los Hechos de los
Castellanos en las Islas i Tierra Firme del Mar Océano. Madrid,
1601. 4to. 4 vols.; also edition Madrid, 1726-30. folio.
Herrera (J. Antonio de), Vindicacion que hace de los cargos del
Señor D. Plácido Vega. Durango, 1861.
Herrera (José J. de), Breves Ideas sobre el arreglo provisional
para el Ejército Mexicano. Mex., 1845; Discurso al prestar el
juramento para entrar al ejercicio de la presidencia. Mex., 1845;
Ley orgánica de la guardia nacional. Mex., 1848; Proyecto de
estatuto del ejército Mexicano. Mexico, 1848.
Herrera y Gomez (J. M. Fernandez de), Crítica sobre las Cartas
Americanas de Rinaldo. Querétaro, 1820. MS.
Hidalgo, La Defensa del Cura. n.pl., 1811.
Hidalgo (D. J.), Apuntes para Escribir la Historia de los Proyectos
de Monarquía en Mexico. Mex., 1868; Copia del Expediente
relativo al Lugar del Nacimiento del. Mex., 1868.
Hidalgo y Costilla (Miguel), Biografía del Cura de Dolores. Paris,
1869.
Hill (S. S.), Travels in Peru and Mexico. London, 1860. 2 vols.
Hillard (G. S.), Life and Campaigns of G. B. McClellan.
Philadelphia, 1864.
Hilliard, Speech on the Mexican War, in U. S. House of Rep., Jan.
5, 1847. Washington, 1847.
Historia de la Milagrosa Imágen de Nuestra Señora del Pueblito,
etc. n.pl., n.d.
Histoire de l’Empire Mexicain representée par figures. Paris, 1696.
Hittell (John S.), History of Culture. New York, 1875.
Holmes (E. B.), Speech on Mexican War, June 18, 1846.
Washington, 1846.
Hospicio de Pobres, Ordenanzas para el gobierno. Mex., 1806;
Sobre Establecimiento del. Mex., 1774.
Hospicios de Filipinas, Documentos Interesantes para saber el
orígen de los bienes, etc. Mexico, 1832.
Hospital Civil de Valencia, Reglamento general, n.pl., n.d.
Hospital del Divino Salvador, Manifiesto que la Junta de
Beneficencia del. Mexico, 1844.
Hospital de Indios, Constituciones. Mexico, 1778.
Hospital para la Tropa, Instruccion y metodo con que se ha de
establecer. Mexico, 1774. 4to.
Hospital Real y general, Constituciones, y Ordenanzas. Mexico,
1778. 4to.
Houston, Speech in United States Senate April 20, 1858. n.pl.,
n.d.
Houston (Sam), Letter to General Santa Anna. Washington, 1852.
Houston (Sam), Speech in United States Senate, February 19,
1847.
Howard (Volney E.), Speech in U. S. House of Rep., July 6, 1852.
Washington, 1852.
Howitt (Mary), History of the U. S. New York, 1860. 2 vols.
Huasteca, Noticias Estadísticas de la Huasteca. Mexico, 1869.
Hudson (Charles), Speech in U. S. House of Rep., December 16,
1846; Feb. 13, 1847. n.pl., n.d.
Huerta (Epitacio), Apuntes para servir á la Historia de los
Defensores de Puebla. Mexico, 1868.
Huesca (Manuel Gutierrez de), Respiracion de gratitud que un
Presbytero Americano. MS.
Humboldt (Alex. de), Essai Politique sur le Royaume de la
Nouvelle Espagne. Paris, 1811. folio. 2 vols, and atlas.
Humboldt (Alex. de), Examen Critique de l’histoire de la
Géographie du Nouveau Continent. Paris, 1836-9. 5 vols.
Humboldt (Alex.), Letters to Varnhagan von Ense. New York,
1860.
Humboldt (Alex.), Life, Travels and Books. New York, 1859.
Humboldt (Alex.), Political Essay on New Spain. Translated by
John Black. London, 1814.
Humboldt (Alex.), Tablas Estadísticas del Reyno de Nueva
España en el año de 1803. n.p., n.d.
Humboldt (Alex. de), Versuch über den politischen Zustand des
Königreichs Neu-Spanien. Tübingen, 1809.
Humboldt (Alex.), Volcans des Cordillères de Quito et du Mexique.
Paris, 1854.
Ibañez (Manuel de), Coleccion de poesías escogidas. Querétaro,
1802. MS.
Ibar (Francisco), Muerte política de le República Mexicana; and
Regeneracion política de la República Mexicana. Mexico, 1829-
30, 2 vols.
Icazbalceta (Joaquin García), Coleccion de Documentos para la
Historia de Mexico. Mexico, 1858-66. folio. 2 vols.
Idea Mercurial y descripcion breve de la plausible jura. Mexico,
1761. 4to.
Ideas necesarias á todo pueblo Americano. Philadelphia, 1821.
Ideas necesarias á todo pueblo Americano que quiere ser libre.
Puebla, 1823.
Iglesia de Chiapas, Observaciones que hace la Iglesia Catedral
del. Mexico, 1826.
Iglesia Catedral de la Puebla de Los Angeles, Reglas y
Ordenanzas del choro. Puebla, 1731.
Iglesia de Guadalaxara, Fundacion y descripcion. Mexico, 1865.
Iglesia Metropolitana de Mexico, Carta pastoral Sept. 10, 1811.
Mexico, 1811.
Iglesia Parroquial de San Miguel, Privilegios y Gracias singulares
que goza esta. Mexico, 1844.
Iglesias (Antonio de San Miguel), Relacion sencilla del funeral y
exequias. Mexico, 1805.
Iglesias (J. M.), Revistas Históricas sobre la Intervencion
Francesa. Mexico, 1867. 3 vols.
Iglesias (José M.), Estudio Constitucional sobre facultades de la
Corte de Justicia. Mex., 1874; Manifiesto del Presidente Interino
sobre las Negociaciones con Diaz. [Querétaro, 1876.]
Iglesias y Conventos de Mexico, Relacion descriptiva. Mexico,
1863.
Iguala, Acta Celebrada en. Mexico, 1821.
Igualdad, Cinco artículos sobre. Mexico, 1850.
Ilustracion Mexicana (La). Mexico, 1851-3. 4 vols.
Immaculada Cõcepcion, Regla y ordinaciones de las Religiosas.
Mexico, 1700.
Imperio de Mexico. A collection relating to Maximilian’s Empire.
Importantes Observaciones sobre los Gravísimos Males en que se
va á ver Envuelta la Nacion como resultado del decreto de 10
del Actual. Mexico, 1846.
Impugnacion á las Cartas de D. J. M. Gutierrez Estrada sobre el
Proyecto de Establecer en Mejico una Monarquía moderada.
Mejico, 1840.
Incitativa de un Español Americano á todos los Españoles
Ultramarinos que se hallan en la peninsula. Mexico, 1820.
Incitativa de un Mejicano á todos los Españoles en defensa de la
que se publicó en la peninsula. Mexico, 1820.
Indicacion del Orígen de los Extravios del Congreso Mexicano.
Mexico, 1822.
Indicador (El), de la Federacion Mejicana. Mejico, 1833-4. 3 vols.
Indulgencias perpetuas concedidas á los congregantes. Mexico,
1793.
Industria Nacional, Reglamento de una sociedad para el fomento.
Mex. 1839; Representacion dirigida al Presidente Provisional.
Meg. 1843.
Infante (Joaquin), Solucion á la cuestion de derecho sobre la
Emancipacion de la América. Cádiz, 1821.
Informacion sobre aranceles en las doctrinas. MS. 1749. folio.
Informe de la Comision nombrada para la reconocimiento del
teatro de Santa-Anna. Mexico, 1843.
Informe Crítico-legal dado al muy ilustre y venerable Cabildo.
Mexico, 1835.
Informe en Estrados en Defensa de los Empleados de la Aduana
Marítima de San Blas. Mexico, 1843.
Informe Secreto al Pueblo Soberano. Mexico, 1833.
Informes en derecho, a Collection.
Inge (S. W.), Speech in U. S. House of Rep., March 22, 1848.
n.pl., n.d.
Inigo (Josef), Funeral Gratitud con que la religiosa comunidad.
Puebla, 1774.
Inquisicion, a Collection.
Inquisicion, Apología. Mexico, 1811.
Inquisicion, Informe sobre el Tribunal. Mexico, 1813.
Inquisicion, Memorial Santa. Mexico, 1821.
Inquisicion de España, Anécdota importante. Mejico, 1820.
Inquisicion Mexicana, Epítome Sumario de las Personas, assí
vivas, como difuntas, que se han penitenciado. [Mexico], 1659.
Inquisicion y pública declaracion testamento y última voluntad de
la Santa Inquisicion. Mexico, 1850.
Inquisidores fiscales, Reglas y Constituciones. Mexico. 1659. 4to.
Inquisidores, Nos los Inquisidores contra la herética pravedad, y
apostasía, en este ciudad. Mexico, 1738.
Instituta Congregationis Oratorii S. Mariae in Vallicella de Urbe.
Mexici, 1830.
Instituto de Ciencias, Literatura y Artes, Memorias. Mex., 1826;
Reglamento. Mex., 1825.
Instituto Nacional de Geografía y Estadística, Boletin. See
Sociedad Mexicana, etc., its later name.
Instituto Religioso, Defensa. Mexico, 1820.
Instruccion de los comisionados de la direccion general y juzgado
privativo de Alcabalas y Pulques del Reyno. Mexico, 1783. 4to.
Instruccion, Formada en virtud de Real Órden de S. M., que se
dirige al Señor Comandante General de Provincias Internas
Don Jacobo Ugarte y Loyola. Mexico, 1786.
Instruccion para la Infantería ligera Ejército Mexicano. Mexico,
1841.
Instruccion pastoral del ilustrísimo Arzobispo de Paris. Mexico,
1822.
Instruccion Provisional para los Comisarios Generales que han de
Administrar los Ramos de la Hacienda Pública. Mexico, 1824.
Instruccion que se remite de órden de Su Majestad á las Juntas
de Temporalidades establecidas en los dominios de Indias.
Madrid, 1784.
Instruccion sobre el cultivo de moreras y cria de gusanos. Mexico,
1830. 4to.
Instrucciones de los Vireyes de Nueva España. Mexico, 1867.
Insurreccion de 1810. A Collection.
Insurreccion de Nueva España, Resúmen Histórico. Mexico, 1821.
Intendentes de Exército y Provincia, Real Ordenanza. Madrid,
1786.
Intendentes de Provincias y Exércitos, Ordenanza de 13 de
Octubre de 1749. Madrid, 1749. folio.
Intervencion Europea en Mejico. Paris, 1859; Philadelphia, 1859.
Interference of the British Government in the dispute between
Spain and her American colonies. MS.
Iriarte (Francisco), Contestacion á la expresion de agravios. Mex.,
1832; Manifiesto á los pueblos de la República Mejicana.
Mejico, 1829.
Iriarte (Francisco Suarez), Defensa, 21 de Marzo de 1850.
Mexico, 1850.
Iris Espanol (El). Mexico, 1820 et seq.
Isabel la Católica, Institucion de la Real Órden Americana.
Mexico, 1816.
Itta (Joseph Mariano Gregorio), Dia festivo Propio para el Culto.
Mexico, 1744.
Iturbide (Agustin), Breve diseño crítico de la emancipacion y
libertad de la nacion Mexicana. Mexico, 1827.
Iturbide (Agustin), Breve manifiesto. Mexico, 1821.
Iturbide (Agustin), Carrera Militar y Política. Mexico, 1827.
Iturbide (Agustin), Carta al Pensador Mexicano. Mexico, 1821.
Iturbide (Agustin), Cartas de los Señores Generales. Mexico,
1821.
Iturbide (Agustin), Catástrofe de el 18 de Mayo de 1822. Mexico,
1826; Paris, 1825.
Iturbide (Agustin), Del grande Iturbide. Mexico, 1838.
Iturbide (Agustin), Denkwürdigkeiten aus dem offentlichen leben
des kaisers von Mexico. Leipzig, 1824.
Iturbide (Agustin), El generalísimo Almirante á los habitantes del
Imperio. Mexico, 1822.
Iturbide (Agustin de), El generalísimo Almirante á sus
conciudadanos. Mexico, 1822.
Iturbide (Agustin de), El Primer gefe del exército imperial. Mexico,
1821.
Iturbide (Agustin de), Entrada Pública en Valladolid. Mexico, n.d.
Iturbide (Agustin), Mémoires Autographes. Paris, 1824.
Iturbide (Agustin), Mejicanos. Puebla, 1821.
Iturbide (Agustin), Plan publicado en Iguala el 24 de Febrero de
1821. Mexico, 1821.
Iturbide (Agustin), Poblanos ilustres. Puebla, n.d.
Iturbide (Agustin), Primer premio general concedido á los señores
gefes y oficiales. Mexico, 1821.
Iturbide (Agustin), Proclama El primer gefe del ejército imperial de
las tres garantías. Mexico, 1821.
Iturbide (Agustin), Proclamacion, dated July 14, 1821. Valladolid,
1821.
Iturri (Francisco), Carta Crítica sobre la historia de América del
Señor Juan Bautista Muñoz. Madrid, 1798; Puebla, 1820.
Ixtlilxochitl (Fernando de Alva), Cruautés Horribles des
Conquérants du Mexique. In Ternaux-Compans, Voy., série i.,
tom. viii.
Ixtlilxochitl, Relaciones. In Kingsborough’s Mex. Antiq., vol. ix.
Izaguirre (Pedro de), Relacion de los sitios y calidades de los
puertos de Caballos. MS. 1604-5. folio.
Jacob (William), Production and Consumption of the Precious
Metals. London, 1831.
Jaillandier (P.), Extrait d’une Lettre datée de Pondichera au mois
de Fevrier. Pondichera, 1711. MS.
Jalisco, Arancel para el cobro de derechos judiciales y curia
eclesiástica. Guad., 1825; Buen Tratamiento de Indios. In
Pacheco and Cárdenas, Col. Doc., tom. xiv.; Comision
Permanente. Guad., 1867; Constitucion política. Guad., 1824;
Contestaciones habidas. Guad., 1825. 4to; Discurso del
Gobernador. n.pl. [1867]; Documentos Importantes. Mex., 1868;
Documentos oficiales de la Comandancia. Guad., 1834. 4to;
Documentos Oficiales de la Revolucion. Guad., 1852; Dictámen
presentado por la comision de Hacda. Guad., 1829; Esposicion
á Congreso. Mex., 1832. 4to; Memoria histórica de los sucesos
mas notables de la conquista particular de Jalisco. Guad., 1833;
Memoria presentada por el Gobernador. Guad., 1857; Memoria
que el Exmo Gob. del Estado el dia 1o de Sept. de 1847. Guad.,
1848; Memoria sobre el Estado actual de la administracion
Pública. Guad., 1826; Noticias Estadísticas. Guad., 1843;
Patriótica Iniciativa. Guad., 1844; Reglamento para el gov.
interior del Cong. Const. [Guad.] 1824; Revolucion de. Guad.,
1852. 4to; Santos Degollado. Guad., 1859; Voto general de los
pueblos de la Provincia. [Mexico], 1823.
Jay (Guillermo), Revista de las causas y consecuencias de la
Guerra con Mexico. Mexico, 1850.
Jay (William), Mexican War. Boston, 1849.
Jenkins (John S.), History of Mexican War. Auburn, 1851; New
York, 1859.
Jesuitas, Cartas escritas por el Rey al papa, extermino de los
jesuitas. MS. 1764-7.
Jesuitas, Colegio de San Francisco Xavier noviciado en
Tepotzotlan. MS. 1770. folio.
Jesuitas, Constitucion secreta. Mexico, 1823.
Jesuitas, Documentos y Obras importantes. Mexico, 1841. 2 vols.
Jesuitas, Instrucciones Secretas. MS.
Jesus María (Nicolás de), La Santidad Derramada. Mexico, 1748.
Jesus de Atocha Silver Mining Co., Mex. San Francisco, 1863.
Jimenez (Lázaro), Inquisidor fiscal de este Santo Oficio. MS.
Jimenez (Manuel María), Apología de la Conducta Militar de
General Santa-Anna. Mex., 1847.
Jimenez y Frias (Jos. Ant.), El Fénix de los Mineros ricos de la
América. Mexico, 1779. 4to.

You might also like