Using Hidden Markov Model To Tag Bangla Parts of Speech: Irene Sultana, Anthony Ovishek Baroi, Md. Tabassenur Rahman

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Using Hidden Markov Model to Tag Bangla Parts of Speech

Irene Sultana, Anthony Ovishek Baroi, Md. Tabassenur Rahman

Dept. of Computer Science Engineering, University of Liberal Arts Bangladesh, Dhaka, Bangladesh

1. Introduction:
In natural language processing part of speech tagging or simply POS is a part of it which classifies the
part of speech of each token of a sentence. There are several methods for part of speech tagging and
among them probability-based taggers are the most effective. Hidden Markov Model or simply HMM
is a probabilistic classification that can classify the part of speech in a sentence. In this report we are
going to describe the Bangla part of speech tagging using HMM (hidden Markov model). And before
describing the tagging project we have to know several terms such as: supervised, unsupervised
learning, stochastic etc. POS tagging or grammatical tagging is a process which can be done by
computational linguistics which uses algorithms and discrete terms. And POS tagging algorithm can be
classified into two forms such as: rule based and stochastic. The first and one of the most used POS
taggers that deploys rule-based algorithms is invented by Eric Brill in his PhD thesis in 1993 popularly
known as E Brills Tagger. Before diving into the hidden Markov model first we have to know about
Markov model. Into the time of existing no language for communicating with others, the only way was
sign language. That’s how we communicate with our pets like dog at home. When we call them “Kalu, sit
down”, he sits down and made response by waggling his tail. This doesn’t mean he understand our
language. He simply understands the emotions and body language more than words. As a human being
we developed lots of natural languages and it is more than any other animals on this earth. So that when
we talk in different sentences, we can differentiate two phrases and our responses are different. We want
to teach our machines like this way that if our future robot dog hears “Kalu, sit down”, he would know sit
is a verb and he would also obey the command by sitting. This is just an example of how to teach a robot
to communicate in a language with us to make things easier.

2. Literature Review:
part of speech tagging is a key matter in several areas of linguistic analysis. In the 2000s the arrival of
POS tagging analysis for Bangla was introduced. It has rule-based systems, applied mathematics
models and unsupervised models. In POS tagging completely different tagging are used such as: unit
rule based, stochastic or transformational based learning approaches.
Rule based taggers try to assign a tag to every tag word given a set of hand written rules. For example,
a word following a determiner associated an adjective should be a noun.

There are other approaches called transformation-based approach which combines the rule-based
approach and applied mathematics approach. one example for an effective tagger is that the Brill
Tagger [4,5,6]. All the approaches mentioned earlier are of supervised POS tagging where a pre tagged
corpus may be a requirement. On the opposite hand, there's the unsupervised POS tagging [ 7, 8,9]
technique, and it doesn't need any pre-tagged corpora.

3. Proposed Solution:
To model any problem using hidden Markov model we have to have a set of observations set and set of
possible sets. In HMM model, states are hidden. In the part of speech tagging problem the observations
are the words which is in the given sequence. As for the states which are hidden these would be the POS
tags for the words. Transition probabilities are like what is the probability of the current word having tag
of verb phrase given that the previous tag was a noun phrase if we write like in this way: P (VP | NP).
And emission probability is the probability that P (girl | NP) or P (will | VP) the word says, girl a given
noun phrase. Note that this is just and informal modelling of the problem to provide a basic understanding
of how the part of speech tagging problem can be solved using an HMM.

3.1 Idea:
Our projected resolution may even be a mix of supervised associate degreed unattended learning for
work associate HMM (hidden Markov model). The word states that a part of speech tag of a word
depends on gift word and therefore the past word. throughout this method a word is taken as input and
may give all realizable tags for the word. thus, this might need the set of all realizable tags for a given
word. POS tagging drawback is expressed as follows: given a sequence of words w 1…wn. we've to hunt
out the corresponding sequence if tags t 1…tn, unit of measurement taken from a bunch of tags T, that
satisfies:
S= argmax P(wi | ti) p(ti | ti…1). For every of the model, the model parameters unit of measurement
calculable supported the given information work information. The goal of hidden Markov model
taggers is maximizing the chance of P (word | tag) × P (tag | porous n tags).
Here,
P (word | tag) = lexical info.
It is word/lexical chance and it's the chance that given this tag, we've this word. it's not the chance that
this word has this tag. It sculptured through language model (word-tag matrix).
P (tag | porous n tags) = language info.
It tags sequence chance. The chance that this tag follows these previous tags. It sculptured through
language model (tag-tag matrix).
If we've got an inclination to seem (n-1) tags before to hunt out current tag n-gram model. In word
model we've got an inclination to elect the foremost probable tag t i, for word American state given the
previous tag ti-1 and tag this word American state. in unigram model (just most-likely tag) choose the
foremost probable tag t i for word American state given this word American state.

3.2 Example for English language:


Race is VB or NN. “Secretariate/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/ADV”.
“people/NNS continue/VBP to /TO inquire/VB the /DT reason/NN for/IN the /DT race/NN for/IN
outer/JJ space/NN”.
Let’s tag the word “race “in initial sentence with a word model. assumptive previous words unit tagged,
we've “Secretariate/NNP is/VBZ expected/VBN to/TO race/?? Tomorrow”. P (race| VB) × P(VB|TO)
=? on condition that we've a VB, however altogether chance is that the current word to be race. Given
the previous tag is TO, however altogether chance is that the current tag to be VB. P(race|NN) ×P(NN|
TO) =? on condition that we've a NN, however altogether chance is that the current word to be race. on
condition that the previous tag is TO, however altogether chance is that the current tag to be NN.

3.3 Process:
Example for Bangla language:
1) মেরি, জেন উইল কে দেখতে পারে।
2) সোফিয়া মেরিকে দেখতে পাবে।
3) জেন কি মেরিকে শনাক্ত করতে পারবে?
4) মেরি সোফিয়াকে মৃদু চাপড় দিবে।

Words N=বিশেষ্য M=সাহায্যকারী ক্রিয়া V=ক্রিয়া

1. মেরি 4 0 0
2. সোফিয়া 2 0 0
3. জেন 2 0 0
4. উইল
1 0 0
5. দেখতে
0 1 0
6. পারে
7. শনাক্ত 0 0 1

8. করতে 1 0 0
9. মৃদু 0 1 0
10. চাপড় 0 0 1
11. দিবে 1 0 0
12. পারবে
0 0 1
0 1 0

Now, divide each column by the total number of their appearances. For example, ‘noun’ appears 11
times in the above sentences so divide each term by 11 in the noun column. We get the following
table after this operation. 

Words N=বিশেষ্য M=সাহায্যকারী ক্রিয়া V=ক্রিয়া


1. মেরি 4/11 0 0
2. সোফিয়া 2/11 0 0
3. জেন 2/11 0 0
4. উইল
1/11 0 0
5. দেখতে
0 1/3 0
6. পারে
7. শনাক্ত 0 0 1/3

8. করতে 1/11 0 0
9. মৃদু 0 1/3 0
10. চাপড় 0 0 1/3
11. দিবে 1/11 0 0
12. পারবে
0 0 1/3
0 1/3 0

From the above table, we infer that

The probability that মেরি is বিশেষ্য (Noun) = 4/11

The probability that দেখতে is সাহায্যকারী ক্রিয়া (Modal) = 1/3


The probability that উইল is বিশেষ্য (Noun) = 1/11

The probability that করতে is সাহায্যকারী ক্রিয়া (Modal = 1/3


In a similar manner, we can figure out the rest of the probabilities. These are the emission
probabilities.

Next, we have to calculate the transition probabilities, so define two more tags S and E. S is placed at
the beginning of each sentence and E at the end as shown in the figure below. N=noun, M=modal,
V=verb
N N N M V

S মেরি জেন উইল কে দেখতে পারে। E

N N M V
S সোফিয়া মেরিকে দেখতে পাবে। E

N N N M V

S জেন কি মেরিকে শনাক্ত করতে পারবে? E

N N N N V
S মেরি সোফিয়াকে মৃদু চাপড় দিবে। E

Let us again create a table and fill it with the co-occurrence counts of the tags.

In the above figure, we can see that the S tag is followed by the N tag four times, thus the first entry
is 4. The model tag follows the S zero, thus the second entry is 0. In a similar manner, the rest of the
table is filled.

N=বিশেষ্য M=সাহায্যকারী V=ক্রিয়া E


ক্রিয়া

S 4 0 0 0

N 8 3 1 0

M 0 0 3 0
V 0 0 0 4

Next, we divide each term in a row of the table by the total number of co-occurrences of the tag in
consideration.

N=বিশেষ্য M=সাহায্যকারী V=ক্রিয়া E


ক্রিয়া

S 4/4 0 0 0

N 8/12 3/12 1/12 0

M 0 0 3/3 0

V 0 0 0 4/4

These are the respective transition probabilities for the above four sentences. Now how does the
HMM determine the appropriate sequence of tags for a particular sentence from the above tables? Let
us find it out.
Take a new sentence and tag them with wrong tags. Let the sentence, ‘’উইল পারে করতে মেরি” be
tagged as-

 উইল as a  (modal)সাহায্যকারী ক্রিয়া


 পারে as a (verb) ক্রিয়া
 করতে as a (noun) বিশেষ্য
 মেরি as a (noun) বিশেষ্য

Now calculate the probability of this sequence being correct in the following manner.

০ 1 0 8/12 0
S M V N N E

0 1/3 0 4/11

উইল পারে করতে মেরি

The probability of the tag Model (M) comes after the tag <S> is ০ as seen in the table. Also, the
probability that the word Will is a Model is ০. In the same manner, we calculate each and every
probability in the graph. Now the product of these probabilities is the likelihood that this sequence is
right. Since the tags are not correct, the product is zero.

০ × ০ × 1×1/3×0×0×8/12×4/11×0=0

When these words are correctly tagged, we get a probability greater than zero as shown below

1 8/12 8/12 1/12 1

S N N N V E

1/11 4/11 1/11 1/3

উইল মেরিকে চাপড় দিবে

Calculating the product of these terms we get,


1× 1/11 ×8/12×4/11×8/12×1/11×1/12×1/3×1= 0.0000371
Which is greater than o. that means the sentence is correct.

4. Conclusion:
In summary, the model described above is very simple and efficient for POS tagging of natural language
text even when the amount of available text is very small. Hidden Markov Model or HMM is a technique
by which we can tag Bangla part of speech. Using this method natural language processing can be
improved in many aspects. Other languages such as: Hindi, Arabic, Indonesian language etc. can be POS
tagging using this method.

5. References:
[1]https://www.connectedpapers.com/main/6efbaa888f87b4a4b02cebb029190227f5ce7c57/Implementing
-a-Part-of-Speech-Tagger-with-Hidden-Markov-Models/graph
[2]https://www.researchgate.net/publication/228530415_Part_of_Speech_Tagging_for_Bengali_with_Hi
dden_Markov_Model
[3] E. Brill, “A simple rule-based part of speech tagger”, In Proceedings of the Third Conference on
Applied 5atural Language Processing, ACL, Trento, Italy, 1992.
[4] E. Brill, “Automatic grammar induction and parsing free text: A transformation-based approach”, 47
In proceedings of 31st Meeting of the Association of Computational Linguistics, Columbus, Oh, 1993.
[5] E. Brill, “Transformation based error driven parsing”, In Proceedings of the Third International
Workshop on Parsing Technologies, Tilburg, The Netherlands, 1993.
[6] E. Brill, “Some advances in rule-based part of speech tagging”, In Proceedings of The Twelfth
5ational Conference on Artificial Intelligence (AAAI94), Seattle, Washington, 1994.
[7] R. Prins and G. van Noord, “Unsupervised PosTagging Improves Parsing Accuracy And Parsing
Efficiency”, In Proceedings of the International Workshop on Parsing Technologies, 2001.
[8] M. Pop, “Unsupervised Part-of-speech Tagging”, Department of Computer Science, Johns Hopkins
University, 1996.
[9] E. Brill, “Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging”, In
Proceeding of The 5atural Language Processing Using Very Large Corpora, Boston, MA, 1997.

You might also like