Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Journal of Clinical Epidemiology 67 (2014) 887e896

Validation of international algorithms to identify adults


with inflammatory bowel disease in health administrative data
from Ontario, Canada
Eric I. Benchimola,b,c,d,*, Astrid Guttmanna,e,f, David R. Mackb,c, Geoffrey C. Nguyena,g,h,
John K. Marshalli, James C. Gregorj, Jenna Wonga, Alan J. Forstera,k,l, Douglas G. Manuela,d,l,m
a
Institute for Clinical Evaluative Sciences, 2075 Bayview Avenue, Toronto, Ontario, Canada, M4N 3M5
b
CHEO Inflammatory Bowel Disease Centre, Division of Gastroenterology, Hepatology and Nutrition, Children’s Hospital of Eastern Ontario,
401 Smyth Road, Ottawa, Ontario, Canada K1H 8L1
c
Department of Pediatrics, University of Ottawa, 451 Smyth Road, Ottawa, Ontario, Canada, K1H 8M5
d
Department of Epidemiology and Community Medicine, University of Ottawa, 451 Smyth Road, Ottawa, Ontario, Canada, K1H 8M5
e
Department of Paediatrics, University of Toronto, 1 King’s College Circle, Toronto, Ontario, Canada, M5S 1A8
f
Institute of Health Policy, Management and Evaluation, 155 College Street, University of Toronto, Toronto, Ontario, Canada, M5T 3M7
g
Department of Medicine, University of Toronto, 1 King’s College Circle, Toronto, Ontario, Canada, M5S 1A8
h
Centre for Inflammatory Bowel Disease, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario, Canada, M5G 1X5
i
Department of Medicine, McMaster University, 1280 Main Street West, Hamilton, Ontario, Canada, L8S 4K1
j
Department of Medicine, London Health Sciences Centre, University of Western Ontario, 339 Windermere Road, London, Ontario, Canada, N6G 2V4
k
Department of Medicine, University of Ottawa, 451 Smyth Road, Ottawa, Ontario, Canada, K1H 8M5
l
Ottawa Hospital Research Institute, 725 Parkdale Ave., Ottawa, Ontario, Canada, K1Y 4E9
m
Department of Family Medicine, University of Ottawa, 451 Smyth Road, Ottawa, Ontario, Canada, K1H 8M5
Accepted 28 February 2014; Published online 26 April 2014

Abstract
Objective: Health administrative databases can be used to track disease incidence, outcomes, and care quality. Case validation is neces-
sary to ensure accurate disease ascertainment using these databases. In this study, we aimed to validate adult-onset inflammatory bowel
disease (IBD) identification algorithms.
Study Design and Setting: We used two large cohorts of incident patients from Ontario, Canada to validate algorithms. We linked
information extracted from charts to health administrative data and compared the accuracy of various algorithms. In addition, we validated
an algorithm to distinguish patients with Crohn’s from those with ulcerative colitis and assessed the adequate look-back period to distin-
guish incident from prevalent cases.
Results: Over 5,000 algorithms were tested. The most accurate algorithm to identify patients 18 to 64 years at diagnosis was five physi-
cian contacts or hospitalizations within 4 years (sensitivity, 76.8%; specificity, 96.2%; positive predictive value (PPV), 81.4%; negative pre-
dictive value (NPV), 95.0%). In patients 65 years at diagnosis, adding a pharmacy claim for an IBD-related medication improved accuracy.
Conclusion: Patients with adult-onset incident IBD can be accurately identified from within health administrative data. The validated
algorithms will be applied to administrative data to expand the Ontario Crohn’s and Colitis Cohort to all patients with IBD in the province
of Ontario. Ó 2014 Elsevier Inc. All rights reserved.
Keywords: Inflammatory bowel disease; Crohn’s; Ulcerative colitis; Epidemiology; Health administrative data; Routinely collected health data; Validation

1. Introduction world [1,2]. Ontario is Canada’s most populous province,


and its single-payer health system entitles legal residents
Inflammatory bowel disease (IBD) is rising worldwide,
to universal access to all health-care services. Ontario’s
with particularly high incidence in developed nations [1].
health administrative databases are a large repository of
Canada has among the highest incidence of IBD in the
all health-care encounters for every legal resident, and these
data have been used to develop surveillance programs for
Conflict of interest: The authors have no conflicts of interest to disclose.
* Corresponding author. Tel.: þ1-416-737-7600x1516; fax: þ1-416- multiple chronic diseases including diabetes [3,4] and
738-4854. asthma [5]. These data represent a unique opportunity to
E-mail address: ebenchimol@cheo.on.ca (E.I. Benchimol). conduct population-based surveillance of patients with
0895-4356/$ - see front matter Ó 2014 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jclinepi.2014.02.019
888 E.I. Benchimol et al. / Journal of Clinical Epidemiology 67 (2014) 887e896

The aims of this study were to (1) develop an algorithm


What is new? to identify individuals with incident adult-onset IBD using
data from charts in Ottawa, Ontario, Canada; (2) validate
 Algorithms to identify adults and elderly patients
and apply the case identification algorithm to the entire On-
with inflammatory bowel disease (IBD) from
tario adult population to expand the OCCC to include adult
within Ontario health administrative data have
patients with IBD. We also assessed the accuracy of other
been validated, including algorithms to classify
internationally validated algorithms in two distinct Ontario
IBD subtype and to determine the look-back period
cohorts using chart review as the reference standard.
required to distinguish incident from prevalent
cases.
 Although some previously reported international 2. Methods
IBD identification algorithms (such as the Manito-
ba algorithm) function well in Ontario, further 2.1. Ethical issues
refinement has improved accuracy of classification This study was approved by the research ethics boards of
of incident cases of IBD. the Children’s Hospital of Eastern Ontario, The Ottawa
 Algorithm validation should be an ongoing pro- Hospital, Hamilton Health Sciences, the London Health
cess. Validation, improvements, and refinement Sciences Centre, Mount Sinai Hospital (Toronto), and
are essential when applying algorithms to new North York General Hospital. Ontario health administrative
administrative databases, jurisdictions, time pe- data is housed at the Institute for Clinical Evaluative Sci-
riods, and patient age groups. ences (ICES; Toronto, Ontario, Canada) designated a pre-
scribed entity under Section 45 of Ontario’s Personal
Health Information Protection Act. Under this Section, pre-
scribed entities are permitted to share personal health infor-
mation with other prescribed entities (such as hospitals) or
linked across databases without obtaining informed consent
IBD within a large jurisdiction. Administrative databa- [20]. Privacy of personal health data is regulated by the In-
seederived cohorts have been used to assess epidemiology, formation and Privacy Commissioner of Ontario, and
health services use, and outcomes in IBD in other jurisdic- approval of ICES activities were most recently reviewed
tions as well [6e12]. Critical to the accuracy of such in 2011 [21]. All projects linking to or using ICES data
research, however, is the ability to accurately identify indi- are evaluated and approved by the ICES privacy officer.
viduals with IBD using a rigorously validated algorithm
comprising the best combination of health administrative 2.2. Administrative data sources
data codes [13].
The Ontario Crohn’s and Colitis Cohort (OCCC), The databases used in this study included hospital
derived from health administrative data, is the largest discharge abstract data (DAD) mandatorily collected from
ongoing population-based surveillance cohort of pediatric- all hospitals and reported to the Canadian Institute for
onset IBD in the world, comprising all children living with Health Information (CIHI-DAD), billing claims for all
IBD in Ontario, Canada [14]. An identification algorithm physician services provided from the Ontario Health Insur-
validated specifically in the pediatric age group was used ance Plan (OHIP), and the Registered Persons Database
to identify cases, and the cohort demonstrated exception- (demographic data including region of residence). Hospital
ally high incidence of pediatric IBD in Ontario [14]. Addi- data before 2002 and all physician billing claims have diag-
tionally, a study from the Canadian IBD Epidemiology noses associated using codes from the International Classi-
database used health administrative data to report on Can- fication of Disease (ICD)-9 [22]. Hospitalizations after
ada’s high incidence in five provinces [15]. This report 2002 used ICD-10 codes [23]. The Ontario Drug Benefits
excluded Ontario and Quebec, Canada’s most populous database was used to obtain complete prescription informa-
provinces and used an identification algorithm validated tion for patients 65 years at diagnosis. Prescription re-
in Manitoba only. IBD identification algorithms have been cords for patients !65 years were only available for
validated for other jurisdictions [16e19], each different those on social assistance, and therefore, was not used for
from the Manitoba and Ontario algorithms. This indicated this study.
the importance of validation in the jurisdiction of the The Ottawa Hospital is a multi-campus medicare facility
administrative data to which the algorithm will be applied. located in Ottawa, Ontario, Canada and serves as the largest
In addition, the validity of these algorithms in separate age adult referral center in Ontario for a population of 1.1
groups (such as patients with adult-onset or elderly-onset million. The Ottawa Hospital Data Warehouse (OHDW)
IBD) was not conducted, nor was re-validation in multiple was used to generate a reference standard for the algorithm
cohorts, all identified as important aspects of algorithm development sample. This database contains administrative
validation by a recent systematic review [13]. data on hospitalization, outpatient visits, emergency
E.I. Benchimol et al. / Journal of Clinical Epidemiology 67 (2014) 887e896 889

department visits, day care visits, and procedures of all pa- by health number to Ontario health administrative data, the
tients seen at the Ottawa Hospital. In addition, the OHDW diagnosis of IBD or non-IBD was uncertain (after review
contains a clinical data repository of laboratory results, by both chart extractors), or they emigrated from Ontario
radiology and pathology reports, and pharmacy data for after diagnosis (and therefore full longitudinal follow-up
all patients seen after April 1, 2002. was unavailable). Inclusion required availability of full
health administrative data from 6 months before the date
2.3. Generating the reference standard cohort of diagnosis (derived from the chart) to the end of FY 2010.
After linkage to the administrative databases, the refer-
To identify a true-positive reference standard cohort of ence cohort of chart-confirmed IBD and non-IBD cases
IBD cases, the OHDW was queried for cases of suspected was used to assess potential case ascertainment algorithms.
IBD in patients 18 years seen at the Ottawa Hospital be- Because the purpose of the algorithm was to identify inci-
tween October 1, 2001 and March 31, 2006, hereafter dent IBD, incident cases only were used as the reference
called the reference cohort. The reference cohort was standard positives. Non-IBD chart extracted cases were
generated by reviewing the chart of every patient seen at used as the reference standard negatives. Cases of IBD
the Ottawa Hospital in this time period who met search identified as being prevalent by chart extraction (ie, those
criteria based on diagnosis codes in radiology, pathology, diagnosed before FY 2001 but with continuous health sys-
procedure, or health-care encounter records as detailed in tem enrollment between 1991 and 2010) were not used for
Table 1. We also reviewed 100 random charts of the total the initial reference cohort, but were linked to the adminis-
3,267 patients who had a prescription for an IBD-related trative data and used later to test look-back periods to
medication but no IBD-related ICD code, radiology report, distinguish incident from prevalent patients.
or pathology report (see Supplemental Table 1 at www.
jclinepi.com for list of drug identification numbers used).
2.4. Developing an IBD case identification algorithm
We found only 1% of these patients were diagnosed with
IBD, and none were incident cases. We therefore excluded We determined the diagnostic accuracy of different algo-
pharmacy data from the search for feasibility reasons. Pa- rithms for incident IBD using various combinations of
tients, who met our search strategy and included in the physician office and procedure billings and hospital records
reference cohort, were determined to be suspicious for with the diagnosis codes for Crohn’s disease (CD) (ICD-9:
IBD. Based on clinical experience, we determined that all 555.x; ICD-10: K50.x) or ulcerative colitis (UC) (ICD-9:
patients who did not have a diagnostic, radiology, or histo- 556.x; ICD-10: K51.x). We also tested the accuracy of pre-
pathology reference to IBD were very unlikely to have IBD viously described administrative database algorithms for
and therefore at low risk for misclassification. The charts of IBD. We determined whether sigmoidoscopy or colonos-
reference cohort were reviewed by one of two chart re- copy before diagnosis improved the diagnostic accuracy
viewers (E.I.B. or a research coordinator). Patients were of the algorithm. We also determined accuracy of algo-
classified as having incident IBD (defined as diagnosis date rithms according to age subgroups, comparing accuracy
between fiscal years (FYs) 2001 to 2005, prevalent IBD of the algorithms in patients diagnosed between 18 and
(defined as diagnosis outside the incident range), or as 64 years and 65 years. For those 65 years, we deter-
non-IBD. After training of the research coordinator by mined whether a prescription for an IBD-related medica-
the principal investigator (E.I.B.), 40 charts were reviewed tion (see Supplemental Table 1 at www.jclinepi.com)
in duplicate to assess agreement. Agreement of the classifi- improved the accuracy of the algorithm. The final algorithm
cation of IBD or non-IBD between the two reviewers was was selected and agreed upon by a committee of five ex-
excellent (Kappa statistic 0.85 6 0.08). Reference cohort perts in the fields of gastroenterology, health services
patients were excluded from the study if they lacked a valid research, epidemiology and administrative database
Ontario health card number, their data could not be linked research (A.G., D.G.M., D.R.M., E.I.B., G.C.N.). The com-
mittee decided on the algorithm with the highest possible
Table 1. Search criteria to identify patients with suspected IBD from positive predictive value (PPV), to minimize false-
within the Ottawa Hospital Data Warehouse positive rate, while maximizing sensitivity over the shortest
A. All hospitalizations, outpatient visits, emergency department possible duration to achieve accurate diagnosis. In addition,
visits, day care visits, procedures with diagnostic codes for CD to determine the shortest available look-back period to
(ICD-9 555.x, ICD-10 K50.x) or UC (ICD-9 556.x, ICD-10 K51.x). distinguish incident from prevalent cases, we determined
B. Radiology reports containing any of the following keywords: the span within which !5% of prevalent cases could have
‘‘Crohn’’, ‘‘Crohn’s’’, ‘‘colitis’’, ‘‘inflammatory bowel disease’’,
absence of health-care contacts for IBD.
‘‘IBD’’, or ‘‘ileitis’’.
C. Histopathology reports containing any of the following keywords:
‘‘Crohn’’, ‘‘Crohn’s’’, ‘‘colitis’’, ‘‘inflammatory bowel disease’’, 2.5. Algorithm validation
‘‘IBD’’, or ‘‘ileitis’’.
Abbreviations: CD, Crohn’s disease; IBD, inflammatory bowel dis- Next, we validated the selected algorithm for patients from
ease; ICD, International Classification of Diseases; UC, ulcerative colitis. other regions of Ontario and those treated in a variety of
890 E.I. Benchimol et al. / Journal of Clinical Epidemiology 67 (2014) 887e896

Fig. 1. Flow diagram of patients included and excluded in the algorithm development cohort derived from Ottawa Hospital Data Warehouse search.
IBD, inflammatory bowel disease; FY, fiscal year.

practice settings. We invited all adult gastroenterology prac- the positive reference sample. Patients were classified as
tices listed on the Web site of the College of Physician and Sur- having CD, UC, or IBD-type unclassified (IBD-U) based
geons of Ontario to participate. In addition, an open letter that on the latest available information. In addition, approxi-
requested participation was e-mailed to all Ontario members mately two randomly selected non-IBD patient charts from
of the Canadian Association of Gastroenterology. Finally, the same practice were reviewed for every IBD patient iden-
large Family Health Teams (group practices of family physi- tified, and this sample acted as the negative reference stan-
cians) with electronic health records were contacted. In total, dard. All patients (incident IBD and non-IBD) from all
eight practices across Ontario participated in the validation participating practices were combined to generate the valida-
project, including two Family Health Teams (comprising the tion cohort, which was then linked to health administrative
practices of 21 independent family physicians), three commu- data by health card number. Using the validation cohort,
nity gastroenterologists and three gastroenterology tertiary we calculated the diagnostic accuracy of the previously
care practices (London Health Sciences Center (London, Can- developed algorithm. In addition, validation cohort was used
ada), Mount Sinai Hospital (Toronto, Canada), and McMaster to develop an algorithm best able to distinguish CD from UC
University Health Centre (Hamilton, Canada). Using searches and UC from CD. Patients who could not be classified as CD
of electronic health records, clinic lists, endoscopy lists and es- or UC by our algorithm were labeled as ‘unclassifiable’.
tablished IBD registries, patients diagnosed 18 years old
with IBD between FY 2001 and 2005 were identified. Charts
2.6. Statistical analysis
of these patients were reviewed by the same reviewers as in the
algorithm development sample. Using both cohorts, we constructed 2  2 tables to calculate
Patients identified as having incident IBD based on clin- diagnostic accuracy of the various algorithms. In the algorithm
ical, endoscopic, and histologic criteria [24e26] served as development cohort, we calculated sensitivity, specificity,
E.I. Benchimol et al. / Journal of Clinical Epidemiology 67 (2014) 887e896 891

PPV, and negative predictive value (NPV) of various combina- algorithms). Of published algorithms, the Manitoba algo-
tions of physician office and procedure billings and hospital rithm demonstrated the highest PPV while maintaining
records using the diagnosis codes for CD (ICD-9: 555.x; adequate sensitivity. The algorithm performed well in On-
ICD-10: K50.x) or UC (ICD-9: 556.x; ICD-10: K51.x). In tario adults aged 18 to 64 years at diagnosis, with sensi-
the algorithm validation cohort, we calculated sensitivity, tivity 79.4% (95% CI: 75.5, 82.8), specificity 95.8%
specificity, positive likelihood ratio (LRþ), negative likeli- (95% CI: 94.9, 96.5), PPV 80.2% (95% CI: 76.4, 83.6),
hood ratio (LR-), as the predictive values would not have and NPV 95.6% (95% CI: 94.6, 96.4).
meaning in a sample with a prevalence of IBD of approxi- Importantly, the Manitoba algorithm did not perform
mately 0.33. Ninety-five percent confidence intervals (CIs) well in adults aged 65 years at diagnosis with sensitivity
were calculated according to the efficient-score method cor- 59.3% (95% CI: 45.1, 72.1), specificity 98.2% (95% CI:
rected for continuity [27]. All analyses were conducted using 97.2, 98.8), PPV 58.2% (95% CI: 44.1, 71.1), and NPV
SAS version 9.3 (SAS Institute Inc., Cary, USA). 98.3% (95% CI: 97.3, 98.9). The Manitoba algorithm was
originally validated in prevalent cases of IBD; investigators
did not define a time window during which patients needed
3. Results to have their cluster of five codes to be classified as having
IBD. Because our aim was to identify incident cases, we
3.1. Algorithm development cohort
determined the minimum length of time required for a pa-
The search of the OHDW resulted in 5,847 charts for re- tient to qualify with the Manitoba algorithm. We tested
view. Of those, 1,747 patients were confirmed to have IBD qualification windows of varying lengths (Table 3).
(554 with incident IBD and 1,193 with prevalent IBD). A Reducing the time window to 4 years did not significantly
total of 3,330 patients did not have IBD. Details of included reduce the sensitivity in patients 18 years, changing it
and excluded patients are reported in Fig. 1. Accuracy of from 77.4% to 75.1%. PPV was also well maintained, with
international identification algorithms were tested against a rise from 78.7% to 80.5%. Because the sensitivity drop-
the reference standard (Table 2). In addition, 150 basic al- ped steeply in shorter windows, the selected identification
gorithms (without utilization of procedural billing codes) algorithm for Ontario adults with incident IBD between
and 5,360 two-step algorithms (with variation of algorithm 18 and 64 years (hereafter referred to as the Ontario algo-
characteristics based on whether a patient underwent colo- rithm) used a similar pattern to the Manitoba algorithm
noscopy or sigmoidoscopy) were tested (see Appendix at (five physician contacts or hospitalizations), over the newly
www.jclinepi.com for diagnostic accuracy of all tested defined 4-year time window.

Table 2. Accuracy of various international algorithms using the Ontario algorithm development and validation cohorts
Algorithm development cohort Algorithm validation cohort
Sensitivity Specificity PPV NPV Sensitivity Specificity
Cohort Algorithm (%) (%) (%) (%) LRD LRL (%) (%) LRD LRL
Denmark [16] Any single diagnostic code for IBD 96.8 82.8 49.6 99.3 5.6 0.04 99.8 90.4 10.4 0.002
(outpatient or hospitalization)
UK (General Any hospitalization 82.2 96.1 78.6 96.8 21.2 0.18 90.9 99.0 86.7 0.092
Practice
Research
Database) [17]
Manitoba [30] (A) For residents of the province for 77.4 96.6 78.7 96.3 22.5 0.23 93.8 99.0 89.5 0.064
2 years: five outpatient or
hospitalizations
(B) For residents of the province for
!2 years, three outpatient or
hospitalizations
Ontario, (A) If scoped: four outpatient or two 78.2 96.2 77.1 96.4 20.6 0.22 94.2 98.9 82.4 0.059
pediatric [14] hospitalizations within 3 y
(B) If not scoped:
seven outpatient or three
hospitalizations within 3 y
Manitoba short 3þ outpatient or hospitalizations 79.2 94.9 71.7 96.5 20.2 0.20 94.1 98.3 54.9 0.06
modification [30] within 2 y
Kaiser Permanente, 2þ outpatient 86.5 91.6 62.9 97.6 10.3 0.15 99.6 91.5 11.6 0.005
California [19]
Alberta [18] Four outpatient or one hospitalization 89.9 94.4 72.4 98.3 16.0 0.11 97.6 98.1 51.2 0.025
within 2 y
Abbreviations: IBD, inflammatory bowel disease; LRþ, positive likelihood ratio; LR, negative likelihood ratio; NPV, negative predictive value;
PPV, positive predictive value.
892 E.I. Benchimol et al. / Journal of Clinical Epidemiology 67 (2014) 887e896

Table 3. Diagnostic accuracies of various time windows using the Manitoba algorithm
Algorithm development cohort Algorithm validation cohort
Time window (y) Sensitivity (%) Specificity (%) PPV (%) NPV (%) LRD LRL Sensitivity (%) Specificity (%) LRD LRL
1 60.7 97.7 81.8 93.8 26.7 0.40 80.1 99.2 104.3 0.20
2 70.0 97.3 81.2 95.1 25.6 0.31 87.9 99.0 91.5 0.12
3 73.5 97.1 80.8 95.6 24.9 0.27 91.3 99.0 95.0 0.088
4 75.1 96.9 80.5 95.9 24.5 0.26 92.0 98.9 87.0 0.081
5 76.2 96.7 79.8 96.0 23.4 0.26 93.1 98.9 88.1 0.07
6 77.0 96.7 79.5 96.1 23.0 0.24 93.8 98.9 88.7 0.063
7 77.1 96.6 79.1 96.2 22.4 0.24 93.8 98.9 88.7 0.063
8 77.4 96.6 79.2 96.2 22.5 0.23 93.8 98.9 88.7 0.063
9 77.4 96.6 79.2 96.2 22.5 0.23 93.8 98.9 88.7 0.063
10 77.4 96.6 79.2 96.2 22.5 0.23 93.8 98.9 88.7 0.063
Abbreviations: LRþ, positive likelihood ratio; LR, negative likelihood ratio; NPV, negative predictive value; PPV, positive predictive value.
The time window refers to the number of years within which a patient must qualify with the algorithm (eg, five health-care contacts within 1, 2, 3
years, and so forth).

To improve the Ontario algorithm accuracy in elderly pa- between consecutive contacts with diagnostic code for
tients, we tested the addition of one pharmacy claim for an IBD is demonstrated in Supplemental Table 2 at www.
IBD-related medication (Table 4). This improved the accu- jclinepi.com. Only 24 of 553 patients (4.3%) had 8 or more
racy of the algorithm without sacrificing sensitivity: sensi- years between visits with diagnostic codes for IBD, indi-
tivity 59.3% (95% CI: 45.1, 72.1), specificity 99.0% (95% cating that O95% of patients were seen at least twice in
CI: 98.2, 99.4), PPV 71.1% (95% CI: 55.5, 83.2), NPV an 8-year time period. Therefore, cases were determined
98.3% (95% CI: 97.3, 98.9). We tested the Ontario algo- to be incident if they had no claims for IBD in the 8 years
rithm without 5-aminosalicylate (ASA) or sulfasalazine in before the algorithm cluster of codes.
the list of IBD-related medications because of their potential
use in non-IBD conditions, which might be confused with
3.2. Ontario algorithm validation by chart review
IBD in administrative data (eg, self-limited colitis, micro-
scopic colitis). This led to an unacceptable decrease in From the eight practices across Ontario who participated
sensitivity to 53.7% (95% CI: 39.7, 67.2), with specificity in the validation portion, 1,636 charts were reviewed. Of
99.3% (95% CI: 98.6, 99.7), PPV 76.3% (95% CI: 59.4, those, 1,515 (93%) charts (464 with incident IBD, 1,051
88.0), and NPV 98.0% (95% CI: 97.1, 98.7). Therefore, without IBD) could be linked to health administrative data
the final algorithm agreed upon by the panel was dependent and met inclusion criteria. Of the included IBD patients,
on age. For patients diagnosed 65 years, five physician 206 (44%) had CD and 242 (52%) had UC, with the re-
contacts or hospitalizations plus a prescription drug claim maining being IBD-U. The performance of the various al-
for an IBD-related medication within 4 years was used to gorithms compared with the validation cohort is
classify patients as having incident IBD. demonstrated in Tables 2e4. Presentation of diagnostic ac-
To determine the adequate look-back period with no curacy of all tested algorithms is presented in the
IBD claims required to distinguish incident from prevalent Appendix. The selected Ontario algorithm (five physician
cases of IBD, we considered prevalent cases of IBD from contacts or hospitalizations within 4 years) identified pa-
the chart extraction cohort to those with full longitudinal tients 18 to 64 years with IBD with sensitivity of 92.3%
data available from FY 1991 to 2010 and who had more (95% CI: 89.2, 94.5), specificity 99.1% (95% CI: 98.1,
than one claim for IBD (n 5 553). The number of years 99.6), LRþ 105.3 (95% CI: 50.3, 220.3), and LR 0.078

Table 4. Diagnostic accuracy of the Ontario identification algorithm (five physician claims and/or hospitalizations within 4 years), stratified by age at
diagnosis
Algorithm development cohort Algorithm validation cohort
Time window Sensitivity (%) Specificity (%) PPV (%) NPV (%) LRD LRL Sensitivity (%) Specificity (%) LRD LRL
18e64 y 76.8 96.2 81.4 95.0 20.0 0.24 92.3 99.1 105.3 0.078
65 y 59.3 98.5 64.0 98.2 39.4 0.41 86.4 98.2 47.5 0.14
65 y with medication claim 59.3 99.0 71.1 98.3 57.5 0.41 78.3 98.2 44.0 0.22
65 y with medication claim 53.7 99.3 76.3 98.0 75.2 0.47 52.2 98.8 44.0 0.48
(excluding 5-ASAs)
Abbreviations: 5-ASA, 5-aminosalicylate; IBD, inflammatory bowel disease; LRþ, positive likelihood ratio; LR, negative likelihood ratio; NPV,
negative predictive value; PPV, positive predictive value.
For elderly patients, the algorithm accuracy with and without IBD-related medication claim demonstrates greater accuracy with the medication
claim included.
E.I. Benchimol et al. / Journal of Clinical Epidemiology 67 (2014) 887e896 893

(95% CI: 0.056, 0.108). Adding a prescription claim to the with misclassification bias representing a threat to study
algorithm for those diagnosed 65 years resulted in sensi- validity. We used two cohorts of adults with IBD derived
tivity of 78.3% (95% CI: 55.8, 91.7), specificity 98.2% from extensive chart review to test the accuracy of thou-
(95% CI: 96.0, 99.3), LRþ 44.0 (95% CI: 19.3, 100.0), sands of identification algorithms those published in the
and LR 0.221 (95% CI: 0.102, 0.481). The 2  2 tables literature and used internationally and also unpublished al-
from which diagnostic accuracy was calculated for the gorithms. In doing so, we found that the algorithm previ-
adult and elderly algorithms are presented in ously validated in Manitoba, Canada (five physician
Supplemental Table 3 at www.jclinepi.com. contacts or hospitalizations with associated diagnostic co-
Various algorithms to distinguish CD from UC patients des for CD or UC) was most accurate to identify adults
are presented in Table 5. The best algorithm to assign aged 18 to 64 years at diagnosis. We shortened the quali-
IBD type was having five of the last nine diagnostic codes fying period, creating an Ontario algorithm to specifically
for either CD or UC. If five of the last nine outpatient identify incident cases. However, improved accuracy for
claims were for CD, the patient was accurately assigned a adults diagnosed aged 65 years was achieved when a
CD diagnosis by chart extraction 95.6% of the time. If five pharmacy claim was added to the Ontario algorithm. We
of the last nine claims were for UC, the patient was as- also validated an algorithm to differentiate CD from UC pa-
signed a UC diagnosis by chart extraction 87.1% of the tients. Finally, we were able to define an adequate look-
time. This resulted in an overall accuracy of 91.1% for as- back period to distinguish incident from prevalent IBD.
signing the correct IBD subtype. Although the Manitoba algorithm has been applied to
various cohorts of patients with IBD to describe epidemi-
ology and health services utilization [15,28], this is the first
4. Discussion study to validate its accuracy in a separate population.
Some studies have applied the shortened modification of
Health administrative data provide the opportunity to the Manitoba algorithm (three contacts or hospitalizations
conduct high-quality, population-based chronic disease sur- within 2 years) when long-term longitudinal data are un-
veillance. However, the data must be validated for accuracy, available, such as in databases of clients of health mainte-
nance organizations [11,29]. Using Ontario patients as the
Table 5. Algorithms to accurately assign IBD subtype (CD or UC) reference standard, we found that this shortened algorithm
Accuracy in was less accurate and resulted in lower PPV. Application of
assigning IBD Accuracy for Accuracy for this algorithm to Ontario health administrative data would
Algorithm subtype (%) CD patients (%) UC patients (%)
result in overestimation of incidence and prevalence.
2 of last 3 codes 88.2 93.2 83.9 It is not surprising that the Manitoba algorithm worked
3 of last 3 codes 81.3 89.8 74.0
3 of last 4 codes 86.8 93.7 80.9 reasonably well among Ontario adults. The two provinces
4 of last 4 codes 79.8 89.3 71.5 share similarities including similar health systems, based
3 of last 5 codes 89.6 95.0 85.0 on fee-for-service physician billing models and profes-
4 of last 5 codes 86.1 94.0 79.4 sional discharge coders trained by CIHI. Although our pre-
5 of last 5 codes 77.1 86.5 69.1 viously published pediatric algorithm was more accurate to
4 of last 6 codes 88.7 94.4 83.8
5 of last 6 codes 85.0 92.4 78.6
identify children diagnosed with IBD patients in Ontario
6 of last 6 codes 75.1 85.3 66.4 [14], specialist pediatric care in the province is primarily
4 of last 7 codes 90.7 95.9 86.2 based on tertiary care hospital alternate funding plans and
5 of last 7 codes 88.4 93.9 83.6 not a fee-for-service model. Therefore, adult care models
6 of last 7 codes 83.6 91.3 76.9 in Ontario may have more pattern similarities with Manito-
7 of last 7 codes 73.2 82.7 64.9
5 of last 8 codes 90.2 95.8 85.3
ba adult care than they do with Ontario pediatric care. Inter-
6 of last 8 codes 86.8 92.7 81.7 estingly, Alberta has a similar health system to Ontario and
7 of last 8 codes 82.0 90.1 74.8 Manitoba, but the Alberta algorithm was not found to be as
8 of last 8 codes 72.4 81.8 64.2 accurate in Ontario. The Alberta validation study, found
5 of last 9 codes 91.1 95.6 87.1 that addition of Ambulatory Care Classification System
6 of last 9 codes 89.0 94.5 84.2
7 of last 9 codes 86.2 92.4 80.9 (ACCS) codes produced more accurate identification of
8 of last 9 codes 82.1 91.3 74.2 IBD patients in Alberta administrative data [18]. Unfortu-
9 of last 9 codes 71.4 82.0 62.2 nately, ACCS codes are not available in Ontario, and there-
6 of last 10 codes 89.5 95.5 84.2 fore, our Ontario algorithm included only OHIP and CIHI
7 of last 10 codes 87.3 93.2 82.3 data. Researchers in Alberta have emphasized the impor-
8 of last 10 codes 84.7 92.1 78.3
9 of last 10 codes 80.7 90.3 72.4
tance of validation and publication of multiple algorithm
10 of last 10 codes 69.4 81.8 58.6 options [18]. This allows for the opportunity to use certain
Abbreviations: CD, Crohn’s disease; IBD, inflammatory bowel dis-
algorithms in specific circumstances. For example, if study
ease; UC, ulcerative colitis. design requires that very few false-positive non-IBD pa-
The bolded algorithm represents the algorithm selected for usage. tients contaminate the IBD cohort, researchers could use
894 E.I. Benchimol et al. / Journal of Clinical Epidemiology 67 (2014) 887e896

a more specific algorithm than the standard identification which is subject to the uncertainty and bias of retrospective
algorithm knowing that it may not be as sensitive. We have review. The published algorithms assessed in this study
therefore included all tested algorithms as Appendix for were originally validated using cohorts comprising both
future reference. Other IBD identification algorithms have incident and prevalent cases. However, we determined
been validated internationally (Table 2), but did not func- their accuracy in an incident-case cohort, and therefore,
tion as well in Ontario. IBD care provided by the Kaiser if the algorithms did not function as well in our cohort, this
Permanente system and Denmark is provided by salary- does not suggest that the algorithm would not function
supported physicians rather than by fee-for-service system. well with a prevalent-case reference standard cohort, nor
Finally, the General Practice Research Database (now does it suggest that the algorithm was not accurate in the
known as the Clinical Practice Research Database) that jurisdiction to which it was applied. However, these algo-
uses primary physician coding, does not include specialist rithms have been used to describe incidence in multiple ju-
physician visits or hospitalizations, and employs some risdictions [15], and therefore, we felt it valuable to
quality control in its data entry. Therefore, it is not surpris- determine their accuracy for incident cases in Ontario.
ing that algorithms developed in those regions did not trans- We are also uncertain of the accuracy of these algorithms
fer well to Ontario patients. These findings demonstrate to identify prevalent Ontario cases. Optimally, a separate
that the accuracy of routinely collected health data identifi- algorithm to identify prevalent cases could be validated.
cation algorithms is based on various factors such as patient Unfortunately, we were unable to confirm the IBD diag-
age, health system structure, and availability of data. Algo- nosis based on pathology in many prevalent cases, due to
rithms should therefore be validated before application to the many years that had passed since the original diag-
databases outside their original validation cohort. In addi- nosis. We therefore elected to concentrate on developing
tion, algorithms should be considered fluid in their accuracy an algorithm to identify confirmed cases based on strict
across time and jurisdictions. We were able to modify and clinical, radiology, and pathologic criteria. In addition,
improve upon the accuracy of the Manitoba algorithm us- the fact that the Manitoba algorithm (validated in prevalent
ing the charts of Ontario patients as reference standard. Manitoba cases previously [30]) was the most accurate,
Therefore, ongoing testing should be conducted to continu- was re-assuring that this algorithm would function well
ally improve on previous administrative data algorithm for incident and prevalent cases.
validation research. The issue of predictive values warrants discussion. PPV
The addition of procedural (colonoscopy or sigmoidos- and NPV have been demonstrated to be very important in
copy) billing codes to the OCCC pediatric algorithm was case ascertainment in epidemiologic studies [31]. Howev-
demonstrated to improve the algorithm accuracy [14]. er, predictive values change depending on the prevalence
These codes did not improve identification of adults with of disease in the population. This is particularly important
IBD in Ontario. Approximately 5% of all adults did not in algorithm validation studies, as changing predictive
have record for endoscopy in the administrative data, a values will result in changed prevalence estimates and
similar proportion to the proportion of children diagnosed health-care costs [32]. Despite the importance of esti-
with IBD. The reason for the improvement of coding accu- mating predictive values using validation cohorts with dis-
racy in children (but not adults) who were scoped may be ease prevalence similar to the prevalence in the general
reflective of the greater uncertainty of the IBD diagnosis population, this is rarely done in previously published vali-
in children before endoscopy. Another strategy used to dation studies [13]. In fact, no previous IBD algorithm
improve identification techniques was to add a prescription validation studies accomplished this feat, with the excep-
record to the algorithm. This was the strategy used in the tion of the OCCC pediatric algorithm validation [14].
aforementioned American studies; however, validation The decentralized nature of the adult health system in On-
was not performed [11,29]. We found that the Manitoba tario, with patients seen at the Ottawa Hospital for other
algorithm was more accurate in patients 65 years at diag- diagnoses potentially treated in other centers for their
nosis with IBD when a prescription for an IBD-related IBD, made it difficult to accomplish this goal in this study.
medication was added. A sensitivity analysis assessing One strategy may have been to include all patients seen at
whether exclusion of 5-ASA prescriptions improved accu- the Ottawa Hospital who were not included in the refer-
racy demonstrated improved PPV but an unacceptable drop ence cohort in the true negative standard. However, pa-
in sensitivity. Use of prescription records may also improve tients with IBD seen at the Ottawa Hospital for reasons
accuracy of the algorithm in patients aged 18 to 64 years at other than their IBD may not have been identified by
diagnosis. Unfortunately, prescription records are not avail- searches of diagnostic codes, pathology reports, or radi-
able for the entire Ontario population. Identification algo- ology tests. The predictive values reported in the algorithm
rithms should be re-assessed and re-validated as regional development cohort are based on the prevalence of IBD in
health patterns change and new databases become available. a population of patients with suspicion of having IBD
This algorithm validation study has a number of other based on mention of IBD in their health record. Therefore,
limitations. First, confirmation of the diagnosis of IBD, the PPV and NPV reported in this study are based on a
CD, and UC was based on abstraction of patient charts, higher prevalence population than the one in 150
E.I. Benchimol et al. / Journal of Clinical Epidemiology 67 (2014) 887e896 895

prevalence reported in other Canadian studies [33]. How- Child Health Services and Policy Research. G.C.N. is a
ever, when one assumes that patients without a diagnostic recipient of New Investigator Awards from CIHR and the
code, radiology test or histopathologic sample are highly Crohn’s and Colitis Foundation of Canada.
unlikely to have IBD, our predictive values likely reflect
those of the algorithm when applied to the general popula-
tion. We tested this theory by assessing the charts of 100 Appendix
patients who had IBD-related pharmaceutical prescription
at the Ottawa Hospital, but did not have mention of IBD Supplementary Data
in their record. We found that !1% of those patients truly Supplementary data associated with this article can be
had IBD. Therefore, we were likely missing very few IBD found, in the online version, at http://dx.doi.org/10.1016/j.
patients whose administrative data records may have been jclinepi.2014.02.019.
confused with non-IBD patients. Application of our algo-
rithm to Ontario-wide data will therefore accurately iden-
tify IBD patients and distinguish those non-IBD patients References
who may be misclassified as having IBD. Those patients [1] Molodecky NA, Soon IS, Rabi DM, Ghali WA, Ferris M, Chernoff G,
who are not at risk of being misclassified (ie, did not have et al. Increasing incidence and prevalence of the inflammatory bowel
either one IBD health-care contact, a radiology report diseases with time, based on systematic review. Gastroenterology
mentioning IBD, or a pathology report mentioning IBD) 2012;142:46e54.e42.
[2] Benchimol EI, Fortinsky KJ, Gozdyra P, Van den Heuvel M, Van
can safely be classified as non-IBD patients with !1% risk Limbergen J, Griffiths AM. Epidemiology of pediatric inflammatory
of truly having IBD. bowel disease: a systematic review of international trends. Inflamm
In summary, using two cohorts of incident IBD patients Bowel Dis 2011;17:423e39.
as reference standards, we have developed and validated [3] Guttmann A, Nakhla M, Henderson M, To T, Daneman D, Cauch-
Ontario-based algorithms to best identify adults and elderly Dudek K, et al. Validation of a health administrative data algorithm
for assessing the epidemiology of diabetes in Canadian children. Pe-
patients with IBD within health administrative data to diatr Diabetes 2010;11:122e8.
expand the OCCC to all ages and create a large [4] Hux JE, Ivis F, Flintoft V, Bica A. Diabetes in Ontario: determination
population-based surveillance cohort of Ontario IBD pa- of prevalence and incidence using a validated administrative data al-
tients. This algorithm builds on previous validation studies gorithm. Diabetes Care 2002;25:512e6.
and emphasizes the role of iterative improvement in admin- [5] To T, Dell S, Dick PT, Cicutto L, Harris JK, MacLusky IB, et al. Case
verification of children with asthma in Ontario. Pediatr Allergy Im-
istrative data classification algorithms to reduce the risk of munol 2006;17:69e76.
misclassification bias. [6] Benchimol EI, Guttmann A, To T, Rabeneck L, Griffiths AM.
Changes to surgical and hospitalization rates of pediatric inflamma-
tory bowel disease in Ontario, Canada (1994-2007). Inflamm Bowel
Dis 2011;17:2153e61.
Acknowledgments [7] Benchimol EI, To T, Griffiths AM, Rabeneck L, Guttmann A. Out-
comes of pediatric inflammatory bowel disease: socioeconomic status
The authors wish to thank the physicians who helped disparity in a universal-access healthcare system. J Pediatr 2011;158:
organize the chart validation portion of this study. Drs. Da- 960e967.e1e4.
vid Kaplan (North York General Hospital, Toronto), An- [8] Bernstein CN, Rawsthorne P, Cheang M, Blanchard JF. A population-
drew Kujavsky (Ottawa), Helena Lau (Oakville), Pardeep based case control study of potential risk factors for IBD. Am J Gas-
troenterol 2006;101:993e1002.
Nijhawan (Richmond Hill), and Brian Stotland [9] Herrinton LJ, Liu L, Lewis JD, Griffin PM, Allison J. Incidence and
(Newmarket). We appreciate the efforts of Paul Sinclair prevalence of inflammatory bowel disease in a Northern California
and the Canadian Association of Gastroenterology in managed care organization, 1996-2002. Am J Gastroenterol 2008;
disseminating the invitation to participate to Ontario gastro- 103:1998e2006.
enterologists. We are also grateful to Jane Earle and Zack [10] Kappelman MD, Horvath-Puho E, Sandler RS, Rubin DT,
Ullman TA, Pedersen L, et al. Thromboembolic risk among Danish
Muqtadir, research coordinators who helped with the chart children and adults with inflammatory bowel diseases: a
review. This research was funded by a Junior Faculty population-based nationwide study. Gut 2011;60:937e43.
Development Grant from the American College of Gastro- [11] Kappelman MD, Rifas-Shiman SL, Porter CQ, Ollendorf DA,
enterology and was made possible with the support of the Sandler RS, Galanko JA, et al. Direct health care costs of Crohn’s dis-
ease and ulcerative colitis in US children and adults. Gastroenter-
Institute for Clinical Evaluative Sciences which receives
ology 2008;135:1907e13.
funding from the Ontario Ministry of Health and Long- [12] Nguyen GC, Laveist TA, Harris ML, Wang MH, Datta LW, Brant SR.
Term Care (MOHLTC). The results and conclusions are Racial disparities in utilization of specialist care and medications in
those of the authors; no official endorsement by the Ontario inflammatory bowel disease. Am J Gastroenterol 2010;105:2202e8.
MOHLTC should be inferred. E.I.B. is supported by a [13] Benchimol EI, Manuel DG, To T, Griffiths AM, Rabeneck L,
Career Development Award from the Canadian Child Guttmann A. Development and use of reporting guidelines for assess-
ing the quality of validation studies of health administrative data. J
Health Clinician Scientist Program, a Canadian Institutes Clin Epidemiol 2011;64:821e9.
of Health Research (CIHR) strategic training program. [14] Benchimol EI, Guttmann A, Griffiths AM, Rabeneck L, Mack DR,
A.G. is supported by a CIHR Chair in Reproductive and Brill H, et al. Increasing incidence of paediatric inflammatory bowel
896 E.I. Benchimol et al. / Journal of Clinical Epidemiology 67 (2014) 887e896

disease in Ontario, Canada: evidence from health administrative data. [24] Jenkins D, Balsitis M, Gallivan S, Dixon MF, Gilmour HM,
Gut 2009;58:1490e7. Shepherd NA, et al. Guidelines for the initial biopsy diagnosis of sus-
[15] Bernstein CN, Wajda A, Svenson LW, MacKenzie A, Koehoorn M, pected chronic idiopathic inflammatory bowel disease. The British
Jackson M, et al. The epidemiology of inflammatory bowel disease Society of Gastroenterology Initiative. J Clin Pathol 1997;50:
in Canada: a population-based study. Am J Gastroenterol 2006;101: 93e105.
1559e68. [25] Mowat C, Cole A, Windsor A, Ahmad T, Arnott I, Driscoll R, et al.
[16] Fonager K, Sorensen HT, Rasmussen SN, Moller-Petersen J, Guidelines for the management of inflammatory bowel disease in
Vyberg M. Assessment of the diagnoses of Crohn’s disease and ulcer- adults. Gut 2011;60:571e607.
ative colitis in a Danish hospital information system. Scand J Gastro- [26] Panes J, Bouhnik Y, Reinisch W, Stoker J, Taylor SA, Baumgart DC,
enterol 1996;31:154e9. et al. Imaging techniques for assessment of inflammatory bowel dis-
[17] Lewis JD, Brensinger C, Bilker WB, Strom BL. Validity and complete- ease: joint ECCO and ESGAR evidence-based consensus guidelines.
ness of the General Practice Research Database for studies of inflam- J Crohns Colitis 2013;7:556e85.
matory bowel disease. Pharmacoepidemiol Drug Saf 2002;11:211e8. [27] Newcombe RG. Two-sided confidence intervals for the single propor-
[18] Rezaie A, Quan H, Fedorak RN, Panaccione R, Hilsden RJ. Develop- tion: comparison of seven methods. Stat Med 1998;17:857e72.
ment and validation of an administrative case definition for inflam- [28] Lowe AM, Roy PO, B-Poulin M, Michel P, Bitton A, St-Onge L,
matory bowel diseases. Can J Gastroenterol 2012;26:711e7. et al. Epidemiology of Crohn’s disease in Quebec, Canada. Inflamm
[19] Weng X, Liu L, Barcellos LF, Allison JE, Herrinton LJ. Clustering of Bowel Dis 2009;15:429e35.
inflammatory bowel disease with immune mediated diseases among [29] Kappelman MD, Rifas-Shiman SL, Kleinman K, Ollendorf D,
members of a Northern California-managed care organization. Am Bousvaros A, Grand RJ, et al. The prevalence and geographic distri-
J Gastroenterol 2007;102:1429e35. bution of Crohn’s disease and ulcerative colitis in the United States.
[20] Privacy at the Institute for Clinical Evaluative Sciences (ICES). 2011. Clin Gastroenterol Hepatol 2007;5:1424e9.
Available at http://www.ices.on.ca/webpage.cfm?site_id51&org_id5 [30] Bernstein CN, Blanchard JF, Rawsthorne P, Wajda A. Epidemiology
119. Accessed February 24, 2014. of Crohn’s disease and ulcerative colitis in a central Canadian prov-
[21] Cavoukian A. Review of the report on the practices and procedures ince: a population-based study. Am J Epidemiol 1999;149:916e24.
of the Institute for Clinical Evaluative Sciences. 2011. Available [31] Brenner H, Gefeller O. Use of the positive predictive value to correct
at http://www.ices.on.ca/file/IPC_approval_letter_2011.pdf. Accessed for disease misclassification in epidemiologic studies. Am J Epide-
February 24, 2014. miol 1993;138:1007e15.
[22] International Classification of Diseases, Ninth Revision, Clinical [32] Manuel DG, Rosella LC, Stukel TA. Importance of accurately iden-
Modification (ICD-9-CM). 2012. Available at http://www.cdc.gov/ tifying disease in studies using electronic health records. BMJ 2010;
nchs/icd/icd9cm.htm. Accessed May 27, 2013. 341:c4226.
[23] International Classification of Diseases, Tenth Revision, Clinical [33] Rocchi A, Benchimol EI, Bernstein CN, Bitton A, Feagan B,
Modification (ICD-10-CM). 2012. Available at http://www.cdc.gov/ Panaccione R, et al. Inflammatory bowel disease: a Canadian burden
nchs/icd.htm. Accessed May 27, 2013. of illness review. Can J Gastroenterol 2012;26:811e7.

You might also like