Professional Documents
Culture Documents
Sample Thesis Report 152-15-5611 Jamiur
Sample Thesis Report 152-15-5611 Jamiur
Supervised By
This thesis titled “Pharmacovigilance study of opioid drugs on Twitter and PubMed
using artificial intelligence”, submitted by Md. Jamiur Rahman Rifat to the
Department of Computer Science and Engineering, Daffodil International University,
has been accepted as satisfactory for the partial fulfillment of the requirements for the
degree of B.Sc. in Computer Science and Engineering (BSc) and approved as to its style
and contents.
BOARD OF EXAMINERS
I hereby declare that, this thesis has been done by me under the supervision of Dr.
Sheak Rashed Haider Noori, Associate Professor and Associate Head Department
of CSE, Daffodil International University. We also declare that neither this thesis nor
any part of this thesis has been submitted elsewhere for award of any degree or diploma.
Supervised by:
Submitted by:
I have given my efforts to this thesis. However, it would not have been possible without
the kind support and help of many individuals. I would like to express my deepest
appreciation to all those who provided me the possibility to complete this report.
At first, I express my heartiest thanks and gratefulness to almighty Allah for His divine
blessings which allowed me to complete this thesis successfully.
A special gratitude I give to my supervisor, Dr. Sheak Rashed Haider Noori, Associate
Professor and Associate Head of CSE department, whose contribution in stimulating
suggestions and encouragement, helped me to coordinate my thesis especially in
writing this report. His endless patience, scholarly guidance, constant and energetic
supervision, constructive criticism, valuable advice have made it possible to complete
this thesis.
Furthermore, I would also like to acknowledge with much appreciation the crucial role
of my department head, Professor Dr. Syed Akhter Hossain, who provided me with his
precious time and kind help to finish this thesis. I also give my deepest thanks to all the
faculty members and staff of CSE department of Daffodil International University.
Finally, I must acknowledge with due respect the constant support and patients of my
parents.
CHAPTER
1-3
CHAPTER 1: INTRODUCTION
1.1 Introduction 1
1.2 Motivation 1
1.3 Rationale of the Study 2
1.4 Output 2
1.5 Report Layout 2
CHAPTER 2: BACKGROUND 4-5
2.1 Introduction 4
4
2.2 Related Works
CHAPTER 3: RESEARCH METHODOLOGY 6-12
CHAPTER 5: CONCLUSIONS 15
PLAGIARISM REPORT 19
FIGURES PAGE NO
TABLE PAGE NO
1.2 Motivation
A review study of Sarkar et al [10] had shown that ADRs accrue an annual cost of 75
billion dollars in hospital related activities. Opioids are a class of drugs used for pain
management and anesthesia. Opioids can cause a significant burden on health care
systems and have become a more remarkable contributor to health care costs due to
ADRs in recent decades. It has been estimated that the monetary toll of prescription
opioid abuses and overdoses in the United States was 78 billion dollars in 2013 with
Norbutas [28] explored crypto market for finding the structure of illegal trading of
opioid drugs. He used Exponential Random Graph Models (ERGM) to accomplish his
goal.
“I had to raise slow down & rest cause of stronger pains this lasr week. More
#Morphin I have already had several operations of the vertebral column and I'm
dd
going to have it others in the coming months.”
“GIVE ME SOME MORFINE.”
morphine MRFN
morfine MRFN
moaphine MFN
morphin MRFN
After using the above mentioned procedure we got the final variant list some of which
has been endorsed in table 3.2.
Finally, we removed duplicated tweets and unicode characters from tweets. Thus we
got 4633 tweets for manual annotation. These data will also be used by the binary
classifier.
In case of topic modeling, we kept the retweets and tweets containing external links.
However, before removing duplicated tweets we also removed unicode characters, ‘@’
and ‘#” characters and punctuations and numerical values. Finally, we have 60140
tweets to feed to our topic modeling algorithm.
we got 89 tweets containing ADRs and 9 tweets bearing indications. Figure 3.2
delineates some of the annotated tweets.
“Buy Tramadol Online Video Looking for a tramadol? Not a problem! Buy
tramadol online ==>”
“I recently bought your Hydro Skin Lip Therapy product.. But I spend more time
trying to get it out the nozzle”
“Tulsans and Oklahomans who live nearby spend between $18.7 and nearly $21
million per year on heroin. That's money”
@Benjih1 Misplaced a vial of morphine today. Turns out it causes quite a commotion ....
Top tip dont misplace a vial of morphine.
3.8.3 Models
Convolution Neural Network (CNN), Convolutional Recurrent Neural Network
(CRNN), Recurrent Neural Network (RNN), Recurrent Convolutional Neural Network
(RCNN) were implemented using the sequential model of keras and Convolutional
Neural Network with Attention (CNNA) with the functional API. “Relu” activation
function was applied in the hidden layers.For optimization deployed “adam” optimizer
in all models.
CNN
The model started with an embedding layer. A dropout value of .25 was used. In the
convolution layer we have used a filter size of 32. In , used a max pooling layer of pool
size 2. Then the input nodes were flatten. We used a hidden layer of 250 nodes. In the
output layer used a single node and sigmoid was used as an activation function in the
output node.
CRNN
The first layer was an embedding layer. Then the output was convolved using a filter
size 32. Max pooling was operated using a pool size 2. Then instead of flattening we
RNN
This model also started with an embedding layer. We used GRU layer of 300 nodes
which is a variant of RNN. Then used a dropout value of 0.5. The output layer is a
single node with “sigmoid” activation function.
RCNN
Embedding layer was the starting step as usual. A basic RNN layer comprising 300
nodes were then used. Then a convolutional layer, max pooling layer and flattening
were performed respectively. A dropout value of 0.50 was used. Lastly, the final output
was a single noded layer with sigmoid activation function.
CNNA
This model was initiated with an embedding layer. Then applied a dropout value of
0.25. Using a filter value of 32 the output matrix was convolved. After that we
incorporated the attention mechanism. In the attention layer “softmax” activation
function was used. The nodes were flatten and a single final output node was used with
a sigmoid activation function.
Our models give a probability value of the class. If the probability value is more than
.50 then we classified that tweets as class “1” that means it contains ADR related
information and vice versa.
4.1 Results
Twitter 98 48 50 48
PubMed 100 77 23 15
Table 4.1 MedDRA encoding performance
The table 4.1 demonstrates a surprising result where it was believed that MedDRA is
good at mapping structured data sources like PubMed but we got that the encoding
platform works much better in social media sites like Twitter. In PubMed we found less
number of PT terms compared to the LLT terms found.
We have evaluated our different methods using (recall, precision and f1 score) by
splitting our dataset into 70:30 ratio where 70% data were used for training and 30%
data for testing. The performance of the models are close to each other where CRNN
performs better than others. We have also observed that skewness could highly affect
our model performance. In table 4.3 we also noticed the predicted result of our models
on four tweets.
Table 4.2 Evaluation metric of different models where bests are marked with bold
The results obtained from various models doesn’t fluctuate a lot but overally CRNN
came up victorious with f1 score .71. The figure 4.1 demonstrates the comparative
supremacy of each model.
In our study we have assembled different techniques those were explored individually
before to analyses pharmacovigilance from twitter and PubMed for opioid drugs. It is
a specialized class of drugs which got several misuses than any other drugs. Illegal
distribution of those drugs can rise in drug addiction in the society. Our system can
find those information from a bulk of data size. We also justified the performance of
MetaMap on both PubMed data and twitter. The binary classifier built depicts that the
availability of quality data is much more appreciable than the choice of algorithm as
the performance is same for different models. One of the drawback of our study is the
lack of quality data which can be eradicate by using some other health related social
media sites. But if anybody wants to use twitter or Facebook then they need to define
the keyword list as precise so that noise data couldn’t jumble up.
[18] Sarker, Abeed, Rachel Ginn, Azadeh Nikfarjam, Karen O’Connor, Karen Smith, Swetha
Jayaraman, Tejaswi Upadhaya, and Graciela Gonzalez. "Utilizing social media data for
pharmacovigilance: a review." Journal of biomedical informatics 54 (2015): 202-212.
[19] Ginn, Rachel, Pranoti Pimpalkhute, Azadeh Nikfarjam, Apurv Patki, Karen O’Connor,
Abeed Sarker, Karen Smith, and Graciela Gonzalez. "Mining Twitter for adverse drug reaction
mentions: a corpus and classification benchmark." In Proceedings of the fourth workshop on
building and evaluating resources for health and biomedical text processing. 2014.
[20] Sarker, Abeed, Karen O’Connor, Rachel Ginn, Matthew Scotch, Karen Smith, Dan
Malone, and Graciela Gonzalez. "Social media mining for toxicovigilance: automatic
monitoring of prescription medication abuse from Twitter." Drug safety 39, no. 3 (2016): 231-
240.
[21] Chee, Brant W., Richard Berlin, and Bruce Schatz. "Predicting adverse drug events from
personal health messages." In AMIA Annual Symposium Proceedings, vol. 2011, p. 217.
American Medical Informatics Association, 2011.
[22] Zorzi, Margherita, Carlo Combi, Riccardo Lora, Marco Pagliarini, and Ugo Moretti.
"Automagically encoding adverse drug reactions in MedDRA." In Healthcare Informatics
(ICHI), 2015 International Conference on, pp. 90-99. IEEE, 2015.
[23] Pimpalkhute, Pranoti, Apurv Patki, Azadeh Nikfarjam, and Graciela Gonzalez. "Phonetic
spelling filter for keyword selection in drug mention mining from social media." AMIA Summits
on Translational Science Proceedings 2014 (2014): 90.
[24] Nikfarjam, Azadeh, Abeed Sarker, Karen O’connor, Rachel Ginn, and Graciela Gonzalez.
"Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence
labeling with word embedding cluster features." Journal of the American Medical Informatics
Association 22, no. 3 (2015): 671-681.
[25] Limsopatham, Nut, and Nigel Henry Collier. "Normalising medical concepts in social
media texts by learning semantic representation." (2016).
[26] Katsuki, Takeo, Tim Ken Mackey, and Raphael Cuomo. "Establishing a link between
prescription drug abuse and illicit online pharmacies: analysis of Twitter data." Journal of
medical Internet research 17, no. 12 (2015).
[27] Mackey, Tim K., Janani Kalyanam, Takeo Katsuki, and Gert Lanckriet. "Twitter-based
detection of illegal online sale of prescription opioid." American journal of public health 107,
no. 12 (2017): 1910-1915.