Professional Documents
Culture Documents
Identification of Confusable Drug Names: A New Approach and Evaluation Methodology
Identification of Confusable Drug Names: A New Approach and Evaluation Methodology
4.2 A Generalized -gram Measure 0 otherwise. The scale distinguishes levels of sim-
ilarity, including 1 for identical bigrams, and 0 for
Although it may not be immediately apparent, completely distinct bigrams.6
LCSR can be viewed as a variant of the -gram ap- The notion of similarity scale between -grams
proach. If is set to 1, the Dice coefficient for- requires clarification in the case of -grams partially
mula returns the number of shared letters divided composed of segments appended to the beginning or
by the average length of two strings. Let us call this end of strings. Normally, extra affixes are composed
measure UNIGRAM. The main difference between of one or more copies of a unique special symbol,
LCSR and UNIGRAM is that the former obeys the such as space, that does not belong to the string al-
no-crossing-links constraint, which stipulates that phabet. We define an alphabet of special symbols
the matched unigrams must form a subsequence of that contains a unique symbol for each of the sym-
both of the compared strings, whereas the latter dis- bols in the original string alphabet. The extra affixes
regards the order of unigrams. E.g., for pat/tap, are assumed to contain copies of special symbols
LCSR returns 0.33 because the length of the longest that correspond to the initial symbol of the string.
common subsequence is 1, while UNIGRAM re- In this way, the similarity between pairs of -grams
turns 1.0 because all letters are shared. The other, in which one or both of the -grams overlap with an
minor difference is that the denominator of LCSR is extra affix is guaranteed to be either 0 or 1.
the length of the longer string, as opposed to the av-
erage length of two strings in UNIGRAM. (In fact, 4.3 BI-SIM
LCSR is sometimes defined with the average length We propose a new measure of orthographic simi-
in the denominator.) larity, called BI-SIM, that aims at combining the
We define a generalized measure based on - advantages of the context inherent in bigrams, the
grams with the following parameters: precision of unigrams, and the strength of the no-
crossing-links constraint. BI-SIM belongs to the
1. The value of . class of -gram measures defined above. Its param-
2. The presence or absence of the no-crossing- eters are: , the no-crossing-links constraint
links constraint. enforced, a single segment appended to the begin-
ning of the string, normalization by the length of the
3. The number of segments appended to the be- longer string, and multi-valued -gram similarity.
ginning and the end of the strings. The rationale behind the specific settings is as fol-
lows. is a minimum value that provides con-
4. The length normalization factor: either the
text for matching segments within a string. The no-
maximum or the average length of the strings.
crossing-links constraint guarantees the sequential-
A number of commonly used similarity measures ity of segment matches. The segment added to the
can be expressed in the above framework. The com- beginning increases the importance of the match of
bination of with the no-crossing-links con- initial segment. The normalization method favors
associations between words of similar length. Fi-
straint produces LCSR. By selecting and
the average normalization factor, we obtain the BI- nally, the refined -gram similarity scale increases
GRAM measure. Thirteen out of twenty two mea- the resolution of the measure.
sures tested by Lambert et al. (1999) are variants BI-SIM is defined by the following recurrence:
that combine either or with various
max
lengths of appended segments.
So far, we have assumed that there are only two
or
set according to the values used for a distinct task of
cross-language cognate identification.
7
We could have also chosen to micro-average the recall In Figure 1, the macro-averaged recall values
values by dividing the total number of true positives discov- achieved by several measures on the USP set are
ered among the top candidates by the total number of true
larity measures implied by our results. plot. Table 5 contains detailed results for
1 1
"COMBINED"
"ALINE"
0.9 0.9 "EDITEX"
"NED"
"TRIGRAM-2B"
"BIGRAM"
0.8 0.8 "PREFIX"
"SOUNDEX"
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
"COMBINED"
0.2 "BI-SIM" 0.2
"EDITEX"
"NED"
"TRIGRAM-2B"
0.1 "BIGRAM" 0.1
"PREFIX"
"SOUNDEX"
0 0
0 5 10 15 20 25 0 5 10 15 20 25
Figure 1: Recall at various thresholds for the USP Figure 2: Recall at various thresholds for the sound-
test set. alike test set.
USP Set Phono Set
top 10 top 20 top 10 top 20 of regular expression rules. (We found that phonetic
PREFIX 0.5651 0.6658 0.2981 0.3478 transcription led to a slight improvement in the re-
EDIT 0.7506 0.8130 0.5139 0.6410 call values achieved by the orthographic measures.)
NED 0.7846 0.8489 0.5590 0.6639
LCSR 0.7375 0.8333 0.4663 0.5769 The parameters of ALINE used in this experiment
BIGRAM 0.6362 0.7148 0.3560 0.4400 were optimized beforehand on the USP set.
TRIGRAM-2B 0.7335 0.8251 0.4674 0.5355
SOUNDEX 0.3965 0.4898 0.2331 0.3326 7 Discussion
EDITEX 0.7558 0.8155 0.5864 0.6911
ALINE 0.7503 0.8303 0.5825 0.6873 The results described in Section 6 clearly indi-
BI-SIM 0.8220 0.8927 0.4838 0.6590 cate that BI-SIM and TRI-SIM, the newly proposed
TRI-SIM 0.8324 0.8946 0.4782 0.6245
COMBINED 0.8560 0.9137 0.6462 0.7737
measures of similarity, outperform several currently
used measures on the USP test set regardless of the
Table 5: Recall at and for both the
USP and the sound-alike test sets. ple combination of several measures achieves even
higher accuracy. On the sound-alike confusion set,
EDITEX and ALINE are the most effective. The
Since the USP set contains both look-alike and accuracy achieved by the best measures is impres-
sound-alike name pairs, we conducted a second ex- sive. For the combined measure, the average recall
periment to compare the performance of various on the USP set exceeds 90% with only the 15 top
measures on sound-alike pairs only. We used a pro- candidates considered.
prietary list of 276 drug names identified by ex- The USP test set has its limitations. The set in-
perts as “names of concern” for 83 “consult” names. cludes pairs that are considered confusable for other
None of the “consult” names and only about 25% reasons than just phonetic or orthographic simi-
of the “names of concern” are in the USP set, i.e., larity, including illegible handwriting, incomplete
there are no true positive pairs shared between the knowledge of drug names, newly available prod-
two sets. The maximum number of true positives ucts, similar packaging or labeling, and incorrect
was 11, while the average for all names was 3.33. selection of a similar name from a computerized
The measures were applied to calculate the sim- product list. In many cases, the names do not sound
ilarity between each of the 83 “consult” names and or look alike, but when handwritten or communi-
a list of 2596 drug names. The results are shown cated verbally, these names have caused or could
in Figure 2. Since the task, which involved identi- cause a mix-up. On the other hand, many clearly
fying, on average, 3.33 true positives among 2596 confusable name pairs are not identified as such
candidates, was more challenging, the recall values (e.g. Erythromycin/Erythrocin, Neosar/Neoral, Lo-
are lower than in Figure 1. All drug names were first razepam/Flurazepam, Erex/Eurax/Urex, etc.).
converted into a phonetic notation by means of a set All similarity measures have their own
strengths and weaknesses. -GRAM is ef- imization of medication errors.
fective at recognizing pairs such as Chlorpro- The task of computing similarity between words
mazine/Prochlorperazine, where a shorter name is also important in other contexts. When an entered
closely matches parts of the longer name. However, name does not exist in a bibliographic database, it is
this advantage is offset by its poor performance on desirable to retrieve names that sound similar. In-
similar-sounding names with few shared bigrams formation retrieval systems may need to expand the
(Nasarel/Nizoral). LCSR is able to identify pairs search in cases where a typed query contains errors
where common subsequences are interleaved or variations in spelling. A related task of the iden-
with dissimilar segments, such as Asparagi- tification of cognates arises in statistical machine
nase/Pegaspargase, but fails on similar sounding translation. The techniques discussed in this paper
names where the overlap of identical segments is may also be applicable in those areas.
minimal (Luride/Lortab). ALINE detects phonetic
similarity even when it is obscured by the orthogra- References
phy (eg. Xanax/Zantac), but phonetic transcription PDR 24th Ed. 2003. Physicians’ Desk Reference
is required beforehand. for Nonprescription Drugs and Dietary Supplements.
The idiosyncrasies of individual measures are at- Thomson PDR, New York, NY.
Patrick A. V. Hall and Geoff R. Dowling. 1980.
tenuated when they are combined together, which
Approximate string matching. Computing Surveys,
may explain the excellent performance of the com- 12(4):381–402.
bined measure. Each measure is focused on a par- Grzegorz Kondrak. 2000. A New Algorithm for the
ticular facet of string similarity: initial segments in Alignment of Phonetic Sequences. In Proceedings of
PREFIX, phonetic sound-alike quality in ALINE, the First Meeting of the North American Chapter of
common clusters in bigram-based measures, overall the Association for Computational Linguistics, pages
transformability in EDIT, etc. For this reason, a syn- 288–295, Seattle, WA.
ergistic blend of several measures achieves higher Bruce L. Lambert, Swu-Jane Lin, Ken-Yu Chang, and
accuracy than any of its components. Sanjay K. Gandhi. 1999. Similarity As a Risk Fac-
tor in Drug-Name Confusion Errors: The Look-Alike
Our experiments confirm that orthographic ap-
(Orthographic) and Sound-Alike (Phonetic) Model.
proaches are superior to their phonetic counterparts Medical Care, 37(12):1214–1225.
in tasks involving string matching (Zobel and Dart, J. Lazarou, B.H. Pomeranz, and P.N. Corey. 1998. Inci-
1995). Nevertheless, phonetic approaches identify dence of Adverse Drug Reactions in Hospitalized Pa-
many sound-alike names that are beyond the reach tients. Journal of the American Medical Association,
of orthographic approaches. In applications where 279:1200–1205.
the gap between spelling and pronunciation plays Michelle Meadows. 2003. Strategies to Reduce Medica-
an important role, it is advisable to employ pho- tion Errors. U.S. Food and Drug Administration Con-
netic approaches as well. The two most effec- sumer Magazine, May-June.
tive ones are EDITEX and ALINE, but whereas I. Dan Melamed. 1999. Bitext maps and alignment
via pattern recognition. Computational Linguistics,
ALINE is language-independent, EDITEX incorpo-
25(1):107–130.
rates English-specific letter groups and rules. Gerard Salton. 1971. The Smart System: Experiments in
Automatic Document Processing. Englewood Cliffs:
8 Conclusion Prentice Hall, NJ.
We have investigated the problem of identifying Esko Ukkonen. 1992. Approximate string-matching
with -grams and maximal matches. Theoretical
confusable drug name pairs. The effectiveness of Computer Science, 92:191–211.
several word similarity measures was evaluated us- Robert A. Wagner and Michael J. Fischer. 1974. The
ing a new recall-based evaluation methodology. We string-to-string correction problem. Journal of the
have proposed a new measure of orthographic simi- ACM, 21(1):168–173.
larity that outperforms several commonly used sim- Justin Zobel and Philip W. Dart. 1995. Finding approx-
ilarity measures when tested on a publicly available imate matches in large lexicons. Software — Practice
list of confusable drug names. On a test set con- and Experience, 25(3):331–345.
taining solely sound-alike confusion pairs phonetic Justin Zobel and Philip Dart. 1996. Phonetic string
approaches, ALINE and EDITEX achieve the best matching: Lessons from information retrieval. In
results. Our results suggest that a linear combina- Proceedings of the 19th International Conference on
Research and Development in Information Retrieval,
tion of several measures benefits from the strengths pages 166–172.
of its components, and is likely to outperform any
individual measure. Such a combined approach has
the potential to provide the basis for automatic min-