Design of Experiment Project Report

PROJECT REPORT
DESIGN OF EXPERIMENT PROJECT REPORT

ACCURACY OF POS-TAGGING ALGORITHMS FOR DIFFERENT
NEWSPAPERS
Kinshuk Kaura and Diksha Singh
Department of Statistics, Ram Lal Anand College, University of Delhi
(Done under the guidance of Professor Dr. Seema Gupta)
1.1 ABSTRACT
There are different approaches to the problem of assigning each word of a text with a
parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. In this paper
we compare the performance of a few POS tagging techniques e.g. statistical approach
(n-gram, HMM) and transformation based approach (Brill’s tagger) and Baum Welch
Approach. A supervised POS tagging approach requires a large amount of annotated
training corpus to tag properly. At this initial stage of POS-tagging. We tried to see which
technique maximizes the performance with this limited resource.
1.2 INTRODUCTION
1.2.1 Part-Of-Speech Tagging
POS tagging refers labelling the word corresponding to which POS best describes the
use of the word in the given sentence. Part-Of-Speech refers to the purpose of a
word in a given sentence.
However, the POS tag of a word can vary depending on the context in which it is
used. One way out of this is to make use of the context of occurrence of a word. As
mentioned above, the POS tag depends on the context of its use. There are set of
rules for some POS tags dictating what POS tag should follow or precede them in a
sentence. For example, a word that occurs between an determiner and a noun
should be an adjective.
1
PROJECT REPORT
1.2.2 Tag set for Part Of Speech Tagging:
1.3 METHODOLOGY
1.3.1 Hidden Markov Model
Hidden Markov Model is a probabilistic sequence model, that computes
probabilities of sequences based on a prior and selects the best possible sequence
that has the maximum probability. Sometimes, what we want to predict is a
sequence of states that aren’t directly observable in the environment. Though we are
given another sequence of states that are observable in the environment and these
hidden states have some dependence on the observable states. If you notice closely,
we can have the words in a sentence as Observable States (given to us in the data)
but their POS Tags as Hidden states and hence we use HMM for estimating POS tags.
It must be noted that we call Observable states as ‘Observation’ & Hidden states as
‘States’.
2
PROJECT REPORT
Our objective is to find the sequence that maximizes the probability defined in the above
diagram.
1.3.2 Tagging Algorithms
Rule Based Taggers Disambiguation is done by analyzing the linguistic features of the word, its
preceding word, its following word, and other aspects.
Stochastic Taggers The tag encountered most frequently in the training set with the word is the
one assigned to an ambiguous instance of that word.
Transformation Based allows us to have linguistic knowledge in a readable form, transforms
one state to another state by using transformation rules. It draws the inspiration from both the
previous explained taggers − rule-based and stochastic.
3
PROJECT REPORT
1.3.3 Viterbi Algorithm

For any model, such as an HMM, that contains hidden variables, the task of
determining which sequence of variables is the underlying source of some sequence
of decoding observations is called the decoding task. the most common decoding
algorithms for HMMs is the Viterbi algorithm.
The Viterbi algorithm is a dynamic programming algorithm for finding the most
likely sequence of hidden states—called the Viterbi path—those results in a
sequence of observed events.
1.3.4 Brill Tagger

Brill’s Learning Algorithm is a rule-based tagger, it uses a series of rules to correct
the results of an initial tagger. These rules it follows are scored based. This score is
equal to the no. of errors they correct minus the no. of new errors they produce.The
algorithm first assigns every word it’s most likely part-of-speech, i.e. the most
common tag for that word. This initial annotation is compared to a hand-annotated
corpus, and a list of errors is produced. For each error, rules to correct the error are
instantiated from a set of rule templates. Each instantiated rule is evaluated by
computing it’s impact on the whole corpus. The rules are compared by assigning
each rule a score, which is the difference between the number of good
transformations and the number of bad transformations the rule produces. The rule
with the highest score is applied to the text and added to the result list. The
transformed corpus is then used to generate a new rule in the next iteration. The
algorithm stops when a certain criteria has been fulfilled, e.g. the error rate is below
a specified threshold.
4
PROJECT REPORT
1.3.5 Baum-Welch Algorithm

Also known as the forward-backward algorithm, the Baum-Welch algorithm is a
dynamic programming approach and a special case of the expectation-maximization
algorithm (EM algorithm). Its purpose is to tune the parameters of the HMM,
namely the state transition matrix A, the emission matrix B, and the initial state
distribution π₀, such that the model is maximally like the observed data. If you have
an HMM that describes your process, the Viterbi algorithm can turn a noisy stream
of observations into a high-confidence guess of what’s going on at each timestep. if
you truly have no labeled data and no knowledge of anything you can use the Baum-
Welch algorithm to fit an HMM. But for technical reasons the Baum-Welch algorithm
doesn’t always give the right answer. On the other hand if you do have some
knowledge and a little bit of time, then there are myriad ways to hack the training
process.
1.3.6 RANDOM BLOCK DESIGN

The presence of nuisance factors introduces systematic variation in the study. For example,
the crops produced in the northern vs the southern part will get exposed to different
climate conditions. Therefore, they should be controlled whenever possible. Controlling
these nuisance factors by blocking reduces the experimental error, thereby increasing the
precision of the experiment and many other benefits. A randomized block design is
an experimental design where the experimental units are blocked. The treatments
are randomly allocated to the experimental units inside each block. When all
treatments appear at least once in each block, we have a completely randomized
block design. Otherwise, we have an incomplete randomized block design. This
kind of design is used to minimize the effects of systematic error. If the
experimenter focuses exclusively on the differences between treatments, the
effects due to variations between the different blocks should be eliminated.
1.3.7 Material used and Steps

• During the process of collection of data, four major online newspapers were taken:
o The New York Times
o BBC
o Hindustan Times
o Times of India
• News articles pertaining to different random genres such as international/world news,
business and entertainment news were scraped using the software Parsehub.
• The three algorithms, (Viterbi, Brill Tagger and Baum-Welch) were coded in Python and
can be found at https://github.com/Kinkshuk/POS-Tagging_DOE
• Combined articles for each newspaper were saved in a text file, and the algorithms were
performed.
5
PROJECT REPORT
Example:
A New York Times news article dated 26 April,2022.
The article scraped using Parsehub into a .text file.
POS tags assigned to the words in the article using Viterbi algorithm.
Accuracy of Viterbi Algorithm for the given article.
6
PROJECT REPORT
1.3.8 FINAL DATA TABLE
Viterbi Brill-tagger Baum-Welch

BBC 99.472 94.17525 96.2873
Hindustan Times 99.348 93.84806 96.95329
New York Times 99.313 94.29486 95.80125
Times of India 99.141 92.90596 95.48093
The above table represents the final data of accuracies achieved using the
algorithms on online news articles of different newspapers.
1.3.9 ANALYSIS
In our experimental design, we have 3 algorithms as treatments, and 4 newspapers
as blocks to make a RANDOMIZED BLOCK DESIGN (RBD).
Yij = µ + αi + βj + eij (i = 1,2,3 and j = 1,2,3,4)
where µ = general mean, αi = fixed effect due to ith treatment,

fixed effect due to jth block, eij = random error.
HYPOTHESIS:
o Null hypothesis: H0: All treatments are homogeneous Ta=Tb=Tc=Td=Te =Tf
o Alternate hypothesis: H1: At least two Ti’s are different.
o Null hypothesis: Ho1: Pair of treatments have the same effect i.e., αi = αi’
o Alternate hypothesis: H11: Pair of treatment have different effects i.e., αi ≠ αi’
7
PROJECT REPORT
ANOVA TABLE
Tests of Between-Subjects Effects
Dependent Variable: Accuracy
Source Type III Sum of df Mean Square F Sig.

Squares
Corrected Model 62.690a 5 12.538 71.466 .000
Intercept 111557.997 1 111557.997 635878.892 .000
Algorithms
61.268 2 30.634 174.613 .000
(Treatments)
Newspapers (Blocks) 1.422 3 .474 2.702 .139
Error 1.053 6 .175
Total 111621.740 12
Corrected Total 63.743 11
a. R Squared = .983 (Adjusted R Squared = .970)
Since p-value for treatments is less than 0.05, we have sufficient evidence to reject the null
hypothesis Ho and can conclude that the treatments are different.
8
PROJECT REPORT
INDIVIDUAL DIFFERENCES
Multiple Comparisons
Dependent Variable: Accuracy

Tukey HSD
(I) (J) Mean Std. Sig. 95% Confidence

Algorithms Algorithms Difference Error Interval
(I-J)
Lower Upper
Bound Bound
Brill- 6.4211
5.5123* .29617 .000 4.6036
tagger
Viterbi
Baum- 4.0964
3.1877* .29617 .000 2.2789
Welch
- -
Viterbi -5.5123* .29617 .000 4.6036
Brill- 6.4211
tagger Baum- - -
-2.3247* .29617 .001 1.4159
Welch 3.2334
- -
Viterbi -3.1877* .29617 .000 2.2789
Baum- 4.0964
Welch Brill-
2.3247* .29617 .001 1.4159 3.2334
tagger
Based on observed means.
The error term is Mean Square(Error) = .175.
*. The mean difference is significant at the .05 level.
On comparing individual differences, we can see that the mean difference is significant
for all pairs as p-value < 0.05, and we may reject the null hypothesis Ho1, and conclude
that all pairs are significantly different.
9
PROJECT REPORT
1.4 CONCLUSION
The RBD experimental design yielded that the accuracy of Parts-of-Speech tagging
algorithms is significantly different for different newspapers. Also, the individual pair
differences are also significantly different for all the algorithms.
1.5 BIBLIOGRAPHY
o https://medium.com/@zhe.feng0018/coding-viterbi-algorithm-for-hmm-from-scratch-
ca59c9203964
o https://www.nltk.org/
o https://www.projectpro.io/recipes/what-is-brill-tagger
o http://www.adeveloperdiary.com/data-science/machine-learning/derivation-and-
implementation-of-baum-welch-algorithm-for-hidden-markov-model/
10

Design of Experiment Project Report

Uploaded by

Copyright:

Available Formats

You might also like

Design of Experiment Project Report

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Design of Experiment Project Report

Uploaded by

Copyright:

Available Formats

PROJECT REPORT

DESIGN OF EXPERIMENT PROJECT REPORT

1.2.2 Tag set for Part Of Speech Tagging:

1.3.2 Tagging Algorithms

1.3.3 Viterbi Algorithm

1.3.4 Brill Tagger

1.3.5 Baum-Welch Algorithm

1.3.6 RANDOM BLOCK DESIGN

1.3.7 Material used and Steps

A New York Times news article dated 26 April,2022.

The article scraped using Parsehub into a .text file.

Accuracy of Viterbi Algorithm for the given article.

1.3.8 FINAL DATA TABLE

Viterbi Brill-tagger Baum-Welch

Yij = µ + αi + βj + eij (i = 1,2,3 and j = 1,2,3,4)

where µ = general mean, αi = fixed effect due to ith treatment,

o Alternate hypothesis: H1: At least two Ti’s are different.

Tests of Between-Subjects Effects

Dependent Variable: Accuracy

Source Type III Sum of df Mean Square F Sig.

Corrected Model 62.690a 5 12.538 71.466 .000

Intercept 111557.997 1 111557.997 635878.892 .000

Newspapers (Blocks) 1.422 3 .474 2.702 .139

Error 1.053 6 .175

Corrected Total 63.743 11

a. R Squared = .983 (Adjusted R Squared = .970)

Dependent Variable: Accuracy

(I) (J) Mean Std. Sig. 95% Confidence

*. The mean difference is significant at the .05 level.

You might also like