Professional Documents
Culture Documents
AMIA SMM4H Workshop
AMIA SMM4H Workshop
1
http://www.pewinternet.org/fact-sheets/health-fact-sheet/ 2http://www.statisticbrain.com/twitter-statistics/
2
http://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/
3
http://www.internetlivestats.com/twitter-statistics/
SM data for pharmacovigilance: Significance
Patient reporting brings different perspective (34
studies) more detail, info on severity and impact of ADRs in
daily life. (PMID 27558545).
Abundant adverse event reports in SM. (29 studies that
compared SM to other sources) showed a higher frequency
of adverse events was found in social media and that this
was particularly true for ‘symptom’ related and ‘mild’
adverse events. (PMID 26271492).
SM data for pharmacovigilance: challenges
Incompleteness:
• Not all health conditions may be revealed
• Complete data about individual cases may be difficult to obtain: taking
drug X, but dosage, frequency etc. information may be missing
• Participants from the cohort may dropout at higher rates
Accessibility:
• Data from social media is not easily collected: API limitations
• Not easily “digestable” once collected: challenging to process using
automatic methods
• Data collection methods may have to be changed frequently over time
Authenticity:
• Bots – a large portion of social media is now generated by bots,
making it harder to mine reliable data
• Automatic processing of postings is often misled: postings mentioning
a drug might not necessarily mean intake.
Social Media pipeline for ADE detection
Concept
extractio
Data Concept n
Mapping
collection
Signal
Detection
Annotation
Classification
Drug-ADE pair
A taste of Twitter ADR lingo
1 Pimpalkhute et al. Phonetic Spelling Filter for Keyword Selection. AMIA Jt Summits Transl Sci Proc. 2014.
2 O’Connor et al. Pharmacovigilance on Twitter. AMIA Annu Symp Proc. 2014.
3 Ginn et al. Mining Twitter for adverse drug reaction mentions. BioTxtM. 2014.
Annotation example
Other:
diabetes Adverse reaction:
Indication:
crying (C0010399)
emotional indifference
(C0001726)
stops me from crying most of the time, blocks most of my
feelings
Text classification
c4 Drug dosage 1000mg, 100mg, .10, 10mg, 600mg, 0.25, .05, ...
c6 Family member brother, dad, daughter, father, husband, mom, mother, son, wife, …
c7 Date 1992, 2011, 2012, 23rd, 8th, april, aug, august, december, …
Concept mapping: translating AEs
Given a (potential) concept mention from the
consumer, find its likely corresponding term
AE given in user language is mapped to its UMLS
concept id (MedDRA -Medical Dictionary for
Regulatory Activities)
Extremely difficult due to colloquialisms: the level
of detail and ambiguity precludes fine-grained
mapping.
To make the problem “solvable”: group similar
effects into a high-level concept
MedDRA
Normalization – HSA
• We propose a system called hybrid semantic analysis (HSA)
that combines rule-based and semantic matching algorithms
• Maps to user-generated mentions to concept IDs in the
Unified Medical Language System
• The semantic matching component of HSA is adaptive in
nature:
• It adapts to a particular normalization task by combining a set of text-
based resources and semantic relatedness measures
• A regression model is used to combine the different semantic
relatedness measures
Emadzadeh E, Sarker A, Nikfarjam A, and Gonzalez G, Hybrid Semantic Analysis for Mapping
Adverse Drug Reaction Mentions in Tweets to Medical Terminology, Annual Symposium of
the American Medical Informatics Association, 2017, Nov 2017.
Performane
Precision Recall F-Measure
Syntactic 88.0 35.7 50.8
LSA-PubM-Dental 83.6 38.2 52.4
LSA-PubM-Nursing 83.1 38.6 52.7
LSA-UMLS-Defs 86.5 40.3 55.0
LSA-Reuters 81.5 44.9 57.9
LSA-PubM-Systematic 83.6 44.4 58.0
LSA-ADR-Tweets 84.6 47.7 61.0
MetaMap 82.6 18.7 30.5
HSA 82.3 50.2 62.4
Social media analysis pipeline drug by drug
Collect data from Twitter using the Public API (
https://dev.twitter.com/streaming/public) using the keywords
humira, adalimumab, and common misspellings
Process tweets with ADRMine system to extract mentions of
ADEs.
Map to standard concepts in the Unified Medical Language
System (UMLS) using automatic a lexical comparison +
manual annotation.
UMLS concepts categorized by frequency.
Run disproportionality analyses: proportional reporting ratio
(PRR) and reporting odds ratio (ROR).
Compare against other sources: pharmacovigilance data,
Micromedex, Lexicomp, Clinical Pharmacology, as well as
systematic reviews/meta-analyses, cohort studies and case-
control studies in the published literature.
Prescription drug abuse mentions
Sarker A, O'Connor K, Ginn R, Scotch M, Smith K, Malone D, Gonzalez G. Social media mining for
toxicovigilance: automatic monitoring of prescription medication abuse from Twitter, Drug Safety,
2016 Mar;39(3):231-40. (PMID: 26748505)
Adderall® vs. oxycodone abuse patterns
Supervised classification to investigate patterns of
abuse-related tweets1
1 Sarker et al. Social media mining for toxicovigilance. Drug Saf. 2016.
Next steps: Mining temporal data
gragon@pennmedicine.upenn.edu
Twitter: @gracielagon
HLP lab:
https://healthlanguageprocessing.org