Using Resident Sensitive Quality Measures Derived.22

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Research Report

Using Resident-Sensitive Quality Measures


Derived From Electronic Health Record Data
to Assess Residents’ Performance in Pediatric
Emergency Medicine
Alina Smirnova, MD, PhD, Saad Chahine, PhD, MEd, Christina Milani, MSc,
Abigail Schuh, MD, MMHPE, Stefanie S. Sebok-Syer, PhD, Jordan L. Swartz, MD, MA,
Jeffrey A. Wilhite, MPH, Adina Kalet, MD, MPH, Steven J. Durning, MD, PhD,
Downloaded from http://journals.lww.com/academicmedicine by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4X

Kiki M.J.M.H. Lombarts, PhD, MSc, Cees P.M. van der Vleuten, PhD, and
Daniel J. Schumacher, MD, PhD, MEd
Mi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 04/18/2024

Abstract
Purpose and case-mix adjusted latent score data (α = 0.77), but not bronchiolitis
Traditional quality metrics do not models, with lower percentiles indicating (α = 0.17). The asthma composite
adequately represent the clinical work a lower quality of care and performance. scores showed high correlations (r =
done by residents and, thus, cannot Reliability and associations between 0.90–0.99) between raw, latent, and
be used to link residency training to the scores produced by the 3 scoring adjusted composite scores. After case-
health care quality. This study aimed to models were compared. Resident and mix adjustments, residents’ absolute
determine whether electronic health patient characteristics associated with percentile rank shifted on average 10
record (EHR) data can be used to performance in the highest and lowest percentiles. Residents who dropped by
meaningfully assess residents’ clinical tertiles and changes in residents’ rank 10 or more percentiles were likely to be
performance in pediatric emergency after case-mix adjustments were also more junior, saw fewer patients, cared
medicine using resident-sensitive quality identified. for less acute and younger patients, or
measures (RSQMs). had patients with a longer emergency
Results department stay.
Method 274 residents and 1,891 individual
EHR data for asthma and bronchiolitis encounters of bronchiolitis patients Conclusions
RSQMs from Cincinnati Children’s aged 0–1 as well as 270 residents For some clinical areas, it is possible
Hospital Medical Center, a quaternary and 1,752 individual encounters of to use EHR data, adjusted for patient
children’s hospital, between July 1, 2017, asthmatic patients aged 2–21 were complexity, to meaningfully assess
and June 30, 2019, were analyzed by included in the analysis. The minimum residents’ clinical performance and
ranking residents based on composite reliability requirement to create a identify opportunities for quality
scores calculated using raw, unadjusted, composite score was met for asthma improvement.

A ligning learner and patient understanding how residency training to determine resident experience (e.g.,
outcomes is a fundamental is linked to quality of care, CBME conditions seen), 13 and obtaining
component of competency-based cannot achieve its mandate of ensuring resident performance quality metrics
medical education (CBME). Without that training prepares graduates to would be a logical next step. The
provide the best possible care to success of this approach depends
Please see the end of this article for information populations of patients. 1 While several on the availability of metrics with
about the authors.
large studies have demonstrated reliability and validity evidence. 14
Correspondence should be addressed to that where a physician completes
Alina Smirnova, Office of Health and Medical residency training predicts future Traditional quality metrics do not
Education Scholarship, University of Calgary,
Calgary, AB T2R 0X7, Canada; email: clinical performance 2–8 and others adequately represent the clinical work
alina.smirnova1@ucalgary.ca. have highlighted the critical role that performed by residents, and, therefore,
Written work prepared by employees of the Federal
residents play in ensuring patient cannot be used to link residency
Government as part of their official duties is, under care quality during training, 9,10 the training to health care quality. 14–20
the U.S. Copyright Act, a “work of the United States use of clinical performance metrics in For instance, mortality rates are often
Government” for which copyright protection under graduate medical education (GME) multifactorial and usually cannot be
Title 17 of the United States Code is not available. As
such, copyright does not extend to the contributions remains limited. Some have proposed attributed to a single clinician. Resident-
of employees of the Federal Government. a “big data” approach to elucidate the sensitive quality measures (RSQMs)
relationship between education and attempt to address this gap. RSQMs
Acad Med. 2023;98:367–375.
First published online November 8, 2022 health care processes and outcomes, are clinical care measures that are both
doi: 10.1097/ACM.0000000000005084 which uses electronic health record important to providing care for an
Supplemental digital content for this article is (EHR) data for quality metrics. 11,12 illness of interest, and likely completed
available at http://links.lww.com/ACADMED/B356. EHR data have previously been used by a resident rather than another

Academic Medicine, Vol. 98, No. 3 / March 2023 367


Research Report

member of the team or by the team Method each condition we studied, which were
collectively. 19,21 RSQMs in pediatric Setting and participants ultimately included in the analysis. We
emergency medicine, developed in focused on RSQMs only available from
consultation with supervisors and Bronchiolitis and asthma are 2 of the the EHR since the aim of the study was
residents, 19,21 have demonstrated a wide most common reasons for visits to PEDs, to develop a set of measures that could be
range of resident performance on both where residents have a high degree realistically replicated at other institutions
individual and composite measures of of autonomy in ordering treatments (scalability).
asthma, bronchiolitis, and closed head before the attending sees the patient. 26
injury, 22 in relation to other variables Their frequency and evidenced-based
Patients aged 0–1 year were included
Downloaded from http://journals.lww.com/academicmedicine by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4X

such as patient complexity and acuity, 23 standardized care protocol make these
in the bronchiolitis dataset. Patients
chief complaints ideal for examining
Mi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 04/18/2024

and for potential use in summative aged 2–21 years were included in the
assessments. 24 For RSQMs to fulfill their what and how residents do when caring
asthma dataset. The patient age ranges
potential for widespread use in GME, for patients with these illnesses. We did
were chosen because bronchiolitis
they must not only be relatively easily not include closed head injury RSQMs
becomes less common over the age of
extractable from the EHR, but they must because they were less automatically
1 year and a formal asthma diagnosis
also demonstrate reliability and provide extractable from the EHR. Any PED
is less common under 2 years of age.
validity evidence. encounters with billing diagnoses of
Additionally, we included several relevant
bronchiolitis or asthma, where a resident
patient, resident, and supervising faculty
The aim of this study was to was assigned to the patient and completed
characteristics, described below.
examine whether RSQMs that are the clinical note, were extracted from the
easily extractable from the EHR EHR (Epic Systems, Verona, Wisconsin)
Methodological approach
can meaningfully assess residents’ at Cincinnati Children’s Hospital Medical
performance. In this proof-of-concept Center (CCHMC) between July 1, 2017, To provide evidence of scoring, we used
study, we assessed the feasibility of and June 30, 2019. The deidentified a psychometric approach to determine
systematically collecting resident- datasets contained information about appropriateness of individual items for
specific performance measures from patient and resident characteristics, the composite score for each condition.
the EHR and evaluated the validity as well as the supervising faculty of First, we calculated the proportion
of these measures using the first 3 record for the encounter. The full data of patients who received appropriate
steps in Kane and Messick’s validity dictionary is available in Supplemental care at the item level, with ideal values
frameworks: scoring (supported Digital Appendix 1, at http://links.lww. being ~0.50 and a range of 0.30–0.70
by Messick’s response process), com/ACADMED/B356. CCHMC is deemed generally acceptable. 32 We also
generalization (supported by Messick’s a quaternary children’s hospital with calculated the internal consistency of
internal structure and content), and approximately 60,000 PED visits annually. each composite score, with a range
extrapolation (supported by Messick’s Between 2 and 7 residents usually of 0.7–1 indicating that items may be
relationship to other variables). 25 staff the department at any given time. grouped together to provide an overall
We did not examine the use of these The majority are categorical pediatric competence in the care of bronchiolitis or
measures for decision supports, but residents, who spend 4 months in the asthma evaluation. 32
rather to highlight the ability to develop PED during their 3-year residency, and
a metric of performance through emergency medicine residents, who Once individual items passed the initial
the use of automatically extractable complete 6 months during a 4-year screening, we used 3 different models
clinical performance measures, without residency. Residents from pediatric to create the composite scores: a raw
the need for chart review, which is combined training programs such as score model (model 1), an unadjusted
essential for applying these measure medicine–pediatrics (4–5 years) and latent score model (model 2), and an
on a larger scale. We tested whether family medicine (3 years) also rotate adjusted latent score model (model 3).
EHR data could be used to create through the PED. We obtained CCHMC In model 1, each indicator was given
RSQM composite scores that reliably institutional review board approval equal weight, and each contributed to
discriminate high- and low-performing before data extraction and analysis. the overall score uniformly. In models 2
residents in management of 2 common The study findings were reported using and 3, we used hierarchical generalized
respiratory diseases, bronchiolitis and the CONSORT extension to pilot and linear models to provide differential
asthma, in the pediatric emergency feasibility trials. 27 weights for each indicator. 33 In addition
department (PED). Additionally, we to valuing more difficult items compared
aimed to understand which resident, Data and measures with easier items, this approach can take
faculty, and patient characteristics were The analysis focused on easily extractable into account various patient and resident
associated with residents’ performance RSQMs for each illness of interest. characteristics. Model 2 was unadjusted,
on these RSQMs. Providing evidence of Using composite scores based on several while model 3 tested several resident
validity and reliability for RSQMs that individual measures compensates for and patient characteristics as covariates.
are easily extractable from the the potential lack of variability in single For model 3, the following statistically
EHR can help provide a baseline performance measures. 28–31 While significant (P < .05) characteristics were
for future replication studies and RSQM composite scores based on 19–23 included: patient age, PED length of
comparison between sites and, individual measures previously have stay, initial placement in the medical
ultimately, support their wider shown good variability, 22 only 5 RSQMs resuscitation bay, and year presented
application in CBME. were readily extractable from the EHR for in the PED (see Supplemental Digital

368 Academic Medicine, Vol. 98, No. 3 / March 2023


Research Report

Appendix 2, at http://links.lww.com/ percentiles indicate a lower quality own performance on the relevant RSQM
ACADMED/B356). In all models, of care and, therefore, lower level of calculated based on patient encounters
indicators were nested within patients and performance. Shifts in rankings were seen by faculty within the same study
patients within residents. We standardized then evaluated using correlations period without a resident and mean
the scores to have a mean of 50 and and comparative statistical testing. patient panel size.
standard deviation of 10 to facilitate Additionally, we examined which
interpretation and comparability. residents experienced the greatest shift For descriptive and comparative analysis,
in ranking by investigating which patient we used SPSS statistical software, version
In models 2 and 3, we also assessed the encounter characteristics were associated 26 (SPSS, Inc., Chicago, Illinois), and
Downloaded from http://journals.lww.com/academicmedicine by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4X

difficulty of each item comprising the with at least a 10-point increase or SAS statistical software, version 9.4 (SAS
composite score as a final check to ensure
Mi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 04/18/2024

decrease in RSQM percentile scores. Institute, Cary, North Carolina) for


each item significantly contributed to the multilevel modeling. Models were then
latent composite scores. Item difficulty To provide evidence of extrapolation, we
confirmed in HLM software, version 8.0
was judged based on the assumption that explored the relationships of residents’
(Scientific Software International, Inc.,
we were indirectly measuring a “latent” performance in the top and bottom
Chapel Hill, North Carolina) and latent
ability estimate (theta) of a person who tertiles with several patient, resident,
scores (models 2 and 3) were generated.
would be considered average (50% and faculty-of-record characteristics
There were no missing data.
of residents) in providing the correct using t tests and chi-square tests. Patient
treatment. The estimates range from −3 characteristics were mean age, PED
to +3. An item that would be equally length of stay, whether they presented Results
difficult or easy for an average ability to a medical resuscitation bay compared
with regular PED room initially, and Asthma RSQMs
person would have a value of 0. Negative
values indicate more difficult items. time of presentation to the PED. Resident Of the 349 total residents, 270 (77%)
characteristics were postgraduate training treated at least one patient with an
To provide evidence of generalization year, sex, program type, and mean asthma exacerbation in the PED during
within each model, we used composite patient panel size. Supervising faculty- the study period, amounting to 1,752
scores to rank residents, where lower of-record characteristics were faculty’s encounters (Table 1). Overall resident

Table 1
Demographic Characteristics of Residents and Patients, From a Study of
Bronchiolitis and Asthma RSQMs for Pediatric Emergency Department Care,
Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, July 2017–June 2019

Bronchiolitis dataset Asthma dataset


Characteristic No. Metric (95% CI) No. Metric (95% CI)
Resident
Mean composite score percentile 274 82.64 (81.6, 83.6) 270 50.0 (46.6, 53.4)
% PGY-1 120 43.8 (38.1, 49.7) 124 45.9 (39.9, 52.1)
% PGY-2 71 25.9 (21.1, 31.4) 69 69 (20.5, 31.2)
% PGY-3 61 22.3 (17.7, 27.6) 56 20.7 (16.1, 26.1)
% PGY-4 22 8.0 (5.4, 11.9) 20 7.4 (4.6, 11.2)
% Female gender 161 58.8 (52.9, 64.4) 158 58.5 (52.4, 64.5)
% Pediatric training program 197 71.9 (66.3, 76.9) 195 72.2 (66.5, 77.5)
% Emergency medicine training program 67 24.5 (19.7, 29.9) 66 24.4 (19.4, 30.0)
Mean patient panel size 274 6.9 (6.3, 7.5) 270 6.49 (6.0, 7.0)
Supervising faculty
Mean composite score percentile 51 82.5 (80.1, 84.9) 52 50.0 (46.6, 53.4)
Mean patient panel size 55 34.4 (28.5, 40.3) 52 48.7 (45.8, 51.6)
Patient
Mean age 1,891 0.3 (0.3, 0.3) 1,752 7.7 (7.4, 8.0)
Mean length of PED stay 1,891 145.3 (133.8, 156.8) 1,752 156.2 (149.7, 162.7)
% Presented to medical resuscitation bay initially 1,891 19.3 (17.6, 21.1) 1,752 15.4 (13.2, 16.5)
% Admitted 1,891 63.6 (61.4, 65.8) 1,752 41.1 (37.8, 42.7)
% Patients seen during the day (08:00–15:59) 773 40.9 (38.7, 43.1) 529 30.2 (28.1, 32.4)
% Patients seen during the evening (16:00–23:59) 719 38.0 (35.9, 40.2) 688 39.3 (37.0, 41.6)
% Patients seen overnight (0:00–7:59) 399 21.1 (19.3, 23.0) 535 30.5 (28.4, 32.7)
Abbreviations: RSQM, resident-sensitive quality measure; CI, confidence interval; PGY, postgraduate year; PED,
pediatric emergency department.

Academic Medicine, Vol. 98, No. 3 / March 2023 369


Research Report

performance on asthma RSQMs is type of residency program, patient panel scores were in their second, third, or
reported in Table 2. Internal consistency size, or other patient characteristics. The fourth postgraduate year in July 2017,
of treatment of asthma was good (α = unadjusted latent score model (model while residents in their first training year
0.77). Table 2 reports the proportion of 2) produced almost identical results to decreased in their ranking.
patients receiving the correct treatment model 1, with correlation values of r =
for each item in addition to item difficulty 0.99 for the standardized scores. The Bronchiolitis RSQMs
for models 2 and 3. Based on these tertile comparison between ranking Of the 349 total residents, 274 residents
results, it was psychometrically justifiable of residents also showed very little (79%) treated patients with bronchiolitis,
to combine the individual asthma difference between the 2 models. amounting to 1,891 encounters over
Downloaded from http://journals.lww.com/academicmedicine by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4X

indicators into a composite score. 2 years (Table 1). Overall resident


Mi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 04/18/2024

Given the almost negligible difference performance on bronchiolitis RSQMs


The unstandardized raw score model and the ease of calculating RSQM is reported in Table 2. The internal
(model 1) produced a normal distribution composite scores in model 1, we opted consistency of bronchiolitis treatment
(n = 270; m = 0.47; 95% confidence to compare model 1 with the adjusted was very poor (α = 0.17). From a
interval [CI] = 0.45, 0.49). Table 3 shows latent model (model 3). Although the psychometric perspective, the items
characteristics associated with ranking in correlation between models 1 and 3 are considered too easy (i.e., the
the highest or lowest tertile, with the last was extremely high (r = 0.92), there care was uniformly good), with an
column indicating significant differences. was a noticeable difference in the average performance of the individual
For example, residents ranking in the tertile groupings between the 2 models bronchiolitis RSQMs of 82.4%. Only
top tertile were significantly more likely (Table 4). Compared with model 1, model half of patients had “nasal bulb suction
to be in their first year of training or to 3 resulted in higher RSQM percentile teaching ordered,” making this the only
be supervised by faculty who themselves scores for residents treating more acute indicator able to distinguish among
scored higher on asthma RSQMs. patients, characterized by the initial residents. Based on these results, a
Conversely, residents in the bottom presentation to the medical resuscitation composite bronchiolitis RSQM score
tertile were significantly more likely to bay, and penalized those who were could not reliably differentiate higher-
be in their second year of training or treating fewer acute patients (Table 5). skilled residents from lower-skilled ones.
have had more encounters with patients On average, residents shifted 10.08
who initially presented to the medical percentiles after patient characteristics
resuscitation bay. No differences were were taken into account. As highlighted Discussion
found in ranking between third and in Table 5, residents who increased at This is the first study to provide
fourth postgraduate years, resident sex, least 10 points on their RSQM percentile evidence for scoring, generalizability,

Table 2
Item Scoring of RSQMs and Characteristics of Individual RSQMs Included in
Composite Scores, From a Study of Bronchiolitis and Asthma RSQMs for Pediatric
Emergency Department Care, Cincinnati Children’s Hospital Medical Center,
Cincinnati, Ohio, July 2017–June 2019

Proportion (95% CI) of Item difficulty Item difficulty


patients receiving (model 2, (model 3,
appropriate treatment unadjusted unadjusted
Condition No. provided by residents latent) latent)
Bronchiolitis (maximum score = 5) 1,891
No antibiotics ordered (1 point) 1,773 93.8 (92.6, 94.8) — —
No albuterol ordered (1 point) 1,677 88.7 (87.2, 90.0) — —
No steroid ordered (1 point) 1,818 96.1 (95.2, 97.9) — —
No chest X-ray ordered (1 point) 1,560 82.5 (80.7, 84.2) — —
 Nasal bulb suction teaching ordered (1 962 50.8 (48.6, 53.1) — —
point)
Average score ~1,559 82.4 (80.6, 84.1) — —
Asthma (maximum score = 5) 1,752
Albuterol ordered (1 point) 890 50.8 (48.5, 53.1) 1.59a 1.20a
Correct dose of albuterol ordered (1 point) 982 56.1 (53.7, 58.4) 1.88 a
1.82a
Dexamethasone steroid ordered (1 point) 834 47.6 (45.3, 49.9) 1.42 a
1.52a
 Correct dose of dexamethasone ordered 731 41.7 (39.4, 44.0) 1.10 a
1.05a
(1 point)
Asthma order set used (1 point) 628 35.8 (33.6, 38.1) −0.77a −0.64a
Average score ~781 44.6 (42.3, 46.9) — —
Abbreviations: RSQM, resident-sensitive quality measure; CI, confidence interval.
a
Items that significantly contribute to the latent composite score (P < .001).

370 Academic Medicine, Vol. 98, No. 3 / March 2023


Research Report

Table 3
Resident, Faculty, and Patient Characteristics Associated With Residents Being
Ranked in the Top or Bottom Tertile Based on Asthma Raw Composite Score,
From a Study of Bronchiolitis and Asthma RSQMs for Pediatric Emergency
Department Care, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio,
July 2017–June 2019

Asthma RSQM raw composite score


Downloaded from http://journals.lww.com/academicmedicine by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4X

Top tertile Bottom tertile


% or Mean % or Mean
Mi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 04/18/2024

Characteristic No. (95% CI) No. (95% CI) P value


Resident characteristics (n = 270)
Mean composite score percentilea 89 83.0 (46.6, 53.4) 83 16.0 (14.1, 17.9) < .001
% PGY1b 46 51.7 (41.5, 61.8) 27 32.5 (23.4, 43.2) .011
% PGY2b 18 20.2 (13.2, 29.7) 30 36.1 (26.6, 46.9) .02
% PGY3b 17 19.1 (12.3, 28.5) 18 21.7 (14.2, 31.7) .674
% PGY4b 7 7.9 (3.9, 15.4) 8 9.6 (5.0, 17.9) .68
% Female sexb 50 56.2 (45.8, 66.0) 47 56.6 (45.9, 66.8) .953
% Pediatric training programb 62 69.7 (59.5, 78.2) 60 72.2 (61.8, 80.8) .705
% Emergency medicine training programb 24 27.0 (18.8, 37.0) 20 24.1 (16.2, 34.3) .666
Mean patient panel sizec — 6.2 (5.3, 7.1) — 6.7 (5.7, 7.7) .453
Supervising faculty characteristics (n = 52)
Mean composite score percentilea — 62.3 (56.8, 67.8) — 35.6 (30.0, 41.2) < .001
Mean patient panel sizea — 47.7 (42.6, 52.8) — 50.8 (45.7, 55.9) .403
Patient characteristics (n = 1,752)
Mean agea — 7.3 (6.9, 7.8) — 7.8 (7.2, 8.5) .229
Mean PED length of staya — 165.3 (148.7, 181.8) — 150.4 (141.6, 159.2) .129
% Presented to medical resuscitation bay initiallyc — 7.1 (4.9, 9.4) — 23.8 (18.7, 28.9) < .001
% Admittedc — 37.7 (31.7, 43.6) — 42.9 (37.0, 48.8) .162
% Patients seen during the day (08:00–15:59)c — 31.0 (25.2, 36.7) — 30.8 (24.8, 36.8) .850
% Patients seen during the evening (16:00–23:59)c — 39.5 (33.3, 45.7) — 37.4 (32.3, 42.6) .739
% Patients seen overnight (24:00–7:59)c — 29.5 (23.8, 35.3) — 31.8 (26.2, 37.3) .457
Abbreviations: RSQM, resident-sensitive quality measure; CI, confidence interval; PGY, postgraduate year; PED,
pediatric emergency department.
a
t test.
b
Chi-square test.
c
Mann-Whitney U test.

care that is attributable to the actions of


Table 4 residents making decisions about care.
Number of Residents Grouped in Tertiles Using Adjusted Asthma Composite Score This discrimination improves when data
Compared With Raw Composite Score, From a Study of Bronchiolitis and Asthma are adjusted for patient acuity, age, and
RSQMs for Pediatric Emergency Department Care, Cincinnati Children’s Hospital academic year, favoring the adjusted
Medical Center, Cincinnati, Ohio, July 2017–June 2019
model. Extrapolation evidence for the
Tertile based on adjusted asthma composite RSQM scores indicates
composite asthma RSQM score, associations with resident postgraduate
Tertile based on raw composite RSQM
no. (%) residents year, faculty performance, and patient
score 1 (n = 90) 2 (n = 90) 3 (n = 90) acuity where first-year residents are
1 (n = 89) 73 (81.1) 16 (17.8) 0 overrepresented in the top tertile and
2 (n = 98) 15 (16.7) 61 (67.8) 22 (24.4)
second-year residents overrepresented
in the bottom tertile; better faculty
3 (n = 83) 2 (2.2) 13 (14.4) 68 (75.6)
performance is associated with better
Abbreviation: RSQM, resident-sensitive quality measure. resident performance; and poor resident
performance is associated with higher
and extrapolation of available RSQMs not bronchiolitis. Overall, the 5-item patient acuity. On the other hand, the
from the EHR for 2 common pediatric asthma composite RSQM scores 5-item bronchiolitis composite RSQM
conditions, to our knowledge. Scoring could discriminate between high and scores showed low variability with
and generalizability evidence favors low performers, suggesting there is a consistently high resident performance.
asthma RSQM composite scores, but proportion of the variation in patient Thus, from a psychometric perspective,

Academic Medicine, Vol. 98, No. 3 / March 2023 371


Research Report

Table 5
Resident, Faculty, and Patient Characteristics Associated With an Absolute Rank
Difference of at Least 10 Percentiles After Adjustment, From a Study of
Bronchiolitis and Asthma RSQMs for Pediatric Emergency Department Care,
Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, July 2017–June
2019

Based on minimum 10 percentile change from raw to


adjusted score
Downloaded from http://journals.lww.com/academicmedicine by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4X

>10 Percentile increase >10 Percentile decrease


Mi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 04/18/2024

Characteristic No. % or mean (95% CI) No. % or mean (95% CI) P value
Resident characteristics (n = 270)
Mean composite score percentilea 55 39.9 (34.2, 45.5) 56 57.4 (52.5, 62.3) < .001
Mean adjusted latent score percentilea 55 59.8 (54.4, 65.3) 56 39.5 (34.4, 44.6) < .001
% PGY1b 11 20.0 (11.5, 32.4) 40 71.4 (58.5, 81.6) < .001
% PGY2b 23 41.8 (29.7, 55.0) 10 17.9 (10.0, 29.8) .006
% PGY3b 13 23.6 (14.4, 36.4) 6 10.7 (5.0, 21.5) .043
% PGY4 b
7 12.7(6.3, 24.0) 0 0 (0.0, 0.1)c
.006
% Female sexb 30 54.6 (41.5, 67.0) 32 57.1 (44.1, 69.2) .783
% Pediatric training programb 36 65.5 (52.3, 76.6) 43 76.8 (64.2, 85.9) .118
% Emergency medicine training programb 18 32.7 (21.8, 45.9) 11 19.6 (11.3, 31.8) .666
Mean patient panel sized 55 7.7 (6.4, 9.1) 56 5.4 (4.5, 6.3) .007
Supervising faculty characteristics (n = 52)
Mean composite score percentilesa — 50.5 (43.4, 57.5) — 50.2 (43.0, 57.4) .96
Mean patient panel sizea — 47.5 (40.6, 54.3) — 48.2 (41.6, 54.9) .876
Patient characteristics (n = 1,752)
Mean agea — 8.1 (7.5, 8.8) — 7.0 (6.6, 7.5) .009
Mean length of PED staya — 142.6 (134.4, 150.8) — 170.3 (160.2, 180.5) < .001
% Presented to the medical resuscitation bay on initial arrivald — 37.0 (32.9, 41.0) — 1.1 (0.1, 2.1) < .001
% Admittedd — 46.6 (40.4, 52.8) — 38.4 (31.0, 45.8) .075
% Patients seen during the day (08:00–15:59) (n = 529)d — 30.8 (24.2, 36.4) — 28.0 (21.7, 34.2) .401
% Patients seen during the evening (16:00–23:59) (n = 688)d — 40.0 (34.3, 45.5) — 42.6 (35.6, 49.5) .724
% Patients seen overnight (24:00–7:59) (n = 525) d
— 29.3 (23.4, 35.2) — 29.5 (22.8, 63.2) .762
Abbreviations: RSQM, resident-sensitive quality measure; CI, confidence interval; PED, pediatric emergency
department.
a
t test.
b
Chi-square test.
c
97.5% confidence interval.
d
Mann-Whitney U test.

it poorly discriminates amongst is far simpler and less dependent on a baseline clinical benchmark and
residents’ performance. By providing clinical context compared with asthma significant signal, for all residents should
baseline scoring, generalizability, and exacerbations. Although this finding may consistently provide high-quality patient
extrapolation evidence for future not be desirable from a psychometric care based on the selected measures. On
replication studies, this study facilitates perspective, it is important from a the other hand, asthma RSQM composite
the wider application of EHR data to mastery perspective. While consistently scores showed more variability in resident
provide individualized, performance- high performance on bronchiolitis performance, and therefore, may be more
based information for residents and RSQMs does not allow for creation useful for differentiating between the
programs. 16 of composite scores or performance performance of individual residents. This
ranking, such measures can be used to finding raises an important question,
In contrast to asthma care, residents identify outliers. It would be meaningful whether (or which) RSQMs could be used
consistently provided high-quality care for training programs to know if a as both criterion-based and performance
to patients with bronchiolitis on the resident underperforms compared differentiating measures in the context of
chosen measures. Although there was with their peers since, from a patient a CBME program of assessment.
no QI initiative around bronchiolitis, care perspective, any substandard
it is possible that residents performed performance by a resident is, by The finding that higher-performing
better on bronchiolitis measures because definition, a compromise of patient residents were more likely to be
the standard of care for bronchiolitis care. Such measures can be considered supervised by higher-performing faculty

372 Academic Medicine, Vol. 98, No. 3 / March 2023


Research Report

may represent the interdependence faculty report). This also resonates with a with these illnesses in other settings,
between residents and faculty, previous study of primary care physicians this information about the PED may be
especially earlier in residency training. 34 that found adjustment for patient panel used by the program to target residents
Interestingly, this relationship did characteristics affected physician ranking who did not get this experience
not hold once patient characteristics on quality measures by an average of 7.6 to ensure competency is achieved.
were taken into account. Thus, patient percentiles, with more than a third of Regarding patient care, low use of
complexity may explain the performance physicians studied being reclassified into the asthma order set could prompt
of both faculty and residents. While a different tertile. 37 the program to emphasize the use of
this study cannot prove causation, this clinical tools for residents before the
Downloaded from http://journals.lww.com/academicmedicine by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4X

finding may be a further indication that While adjustment for case-mix start of the rotation. Residents may also
Mi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 04/18/2024

individual practice style may be shaped characteristics avoids unfairly penalizing benefit from closer clinical supervision
during residency training in a manner residents who tend to see sicker when caring for patients with acute
that imprints future practice patterns for patients, 38 from a patient perspective, exacerbation of asthma, rather than
decades to come. 2,5 these findings suggest that sicker patients bronchiolitis, especially in their first 2
may not consistently receive high-quality years of training and with higher acuity
We found that residents in the beginning care in this setting. Alternatively, sicker patients.
of their training were more likely to patients may prompt faculty and residents
score higher on asthma RSQMs, which to engage with the system in a different Strengths and limitations
goes against the logic that performance way that is not captured by this EHR Using EHR data to assess residents has
improves with experience. 25 This finding data. 39 Hence, case-mix adjustment may several strengths. We were able to obtain
could be explained by more junior be reasonable from an assessment point data for all patients with bronchiolitis and
residents being more likely to follow of view, but one should not overlook the asthma exacerbation seen by a majority
standards of care provided to them opportunities for improvement in patient of residents rotating through the PED
because they do not yet have their own care from a care quality perspective. within the study period. Using RSQMs
care style and preferences. It could also be strengthens the assessment process by
due to greater interdependence between Implications for practice providing resident-level as well as patient-
faculty and first-year residents, in which RSQM composite measures, easily level data. It is possible to objectively
less experienced residents are more extractable from the EHR, can provide identify underperforming residents,
likely to review orders with their faculty meaningful information about residents’ while providing feedback to residents
before placing them in the EHR. 35 As clinical performance and point out that is performance-specific, allowing
residents advance in their training and opportunities for improving health care them to track and work to improve their
gain more independence, they are also delivery. Although our application in this performance. This method provides
given more responsibility and harder study is just one example, RSQMs are unique opportunities for identifying
tasks, leading to a lower scores because generalizable to any training program needed quality improvement initiatives,
the task (i.e., the requisite patient care) that uses an EHR. In the future, RSQM thereby responding to the goal of GME to
is harder. 36 In this study, the apparent use can help set a standard among improve quality of care while supporting
overrepresentation of first-year residents programs for resident feedback, in the goals of CBME.
in the top tertile seems to be accounted addition to setting educational goals
for by seeing fewer patients overall, when residents are missing clinical This study has several limitations. First,
seeing fewer acute patients, or potentially experiences. A similar procedure of this was a single-institution study. While
spending more time on patient care. defining and testing RSQMs can be this controls for environmental factors
In addition, residents in the second applied to other clinical settings, although when comparing between residents at
year of their training performed worse future replication studies are needed the same institution, it is necessary to
on asthma RSQMs, but their ranking to examine the feasibility of extracting determine whether these findings are
improved on average when higher RSQMs from EHRs in different contexts, generalizable to other institutions. More
patient acuity and larger patient panel such as using different EHR systems. work is needed to provide residents and
sizes were taken into account. This may When calculating RSQM composite programs with a full dashboard of RSQMs,
have implications for the number and scores, however, some programs might as performing well on one measure does
acuity of patients whom residents should prefer calculating raw scores, but not translate to good performance on all
be allowed to manage to ensure they residents should be compared within measures.28,31 Second, EHR data may not
are positioned to provide high-quality their own level of experience/cohort. provide sufficient information necessary
care. In an earlier study, Schumacher for risk adjustment.29 Variations in
and colleagues found higher patient EHR data can also provide direct electronic charting systems may limit the
acuity and complexity to be associated information for program directors about ability of RSQMs to be replicated at other
with a higher RSQM composite score the levels of exposure of individual sites. This work is currently underway.
for both asthma and bronchiolitis after residents to various clinical scenarios. Finally, the weights of individual items in
controlling for postgraduate year. 23 Our In this study, roughly 20% of residents the composite score may not reflect their
current study builds upon these findings did not get the experience of caring true proportional contribution to patient
using a more objective measure of for patients with asthma exacerbation outcome.29 Health care is ultimately
acuity (i.e., presentation to the medical or bronchiolitis in the PED. While provided by teams in a collaborative way,
resuscitation bay on arrival rather than residents may have cared for patients and a single RSQM, or even a composite of

Academic Medicine, Vol. 98, No. 3 / March 2023 373


Research Report

them, cannot fully encompass the quality and Medical Education Scholarship Symposium References
of care delivered by a single resident. (OHMES), February 8, 2021, in Calgary, Canada
1 Frenk J, Chen L, Bhutta ZA, et al.
(virtual conference). This study was presented
Health professionals for a new century:
Future research for oral presentation as a research paper at the Transforming education to strengthen health
AMEE conference in Lyon, France, August systems in an interdependent world. Lancet.
Future studies are needed to replicate these 29–31, 2022. 2010;376:1923–1958.
findings at other sites as well as in programs 2 Asch DA, Nicholson S, Srinivas S, Herrin J,
Data: Only data from CCHMC were used for
located in smaller and rural locations. This Epstein AJ. Evaluating obstetrical residency
this study. CCHMC institutional review board programs using patient outcomes. JAMA.
research should focus on understanding approval was obtained before data extraction and 2009;302:1277–1283.
the relationships between faculty and
Downloaded from http://journals.lww.com/academicmedicine by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4X

analysis. 3 Epstein AJ, Srinivas SK, Nicholson S, Herrin


resident performance as well as the effects J, Asch DA. Association between physicians’
Mi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 04/18/2024

of patient case-mix and other resident A. Smirnova is clinical assistant professor, experience after training and maternal
characteristics on clinical performance.17 Department of Family Medicine, University of obstetrical outcomes: Cohort study. BMJ.
Calgary, Calgary, Alberta, Canada, and adjunct 2013;346:f1596.
Further research should examine the extent assistant professor, Kern Institute for the 4 Asch DA, Nicholson S, Srinivas SK,
of interdependence between residents and Transformation of Medical Education, Medical Herrin J, Epstein AJ. How do you deliver
faculty in decision making around orders College of Wisconsin, Milwaukee, Wisconsin; ORCID: a good obstetrician? Outcome-based
https://orcid.org/0000-0003-4491-3007.
at different stages of training to better evaluation of medical education. Acad Med.
interpret the extrapolation evidence.36 S. Chahine is associate professor of measurement 2014;89:24–26.
and assessment, Faculty of Education, Queen’s 5 Bansal N, Simmons KD, Epstein AJ, Morris
Studying the relationship between RSQMs JB, Kelz RR. Using patient outcomes
University, Kingston, Ontario, Canada; ORCID:
and other traditional workplace-based https://orcid.org/0000-0003-0488-773X. to evaluate general surgery residency
assessment approaches would provide program performance. JAMA Surg.
C. Milani is clinical research assistant, Bruyère 2016;151:111–119.
additional extrapolation evidence for Research Institute, Ottawa, Ontario, Canada.
this novel method of assessment. Further 6 Chen C, Petterson S, Phillips R, Bazemore
A. Schuh is associate professor of pediatrics, A, Mullan F. Spending patterns in region
research could also focus on studying Division of Emergency Medicine, Medical College of residency training and subsequent
cross-classification effects stemming of Wisconsin, Milwaukee, Wisconsin; ORCID: http:// expenditures for care provided by practicing
from the interdependent nature of team- orcid.org/0000-0002-6422-2361. physicians for Medicare beneficiaries. JAMA.
based patient care39 while developing 2014;312:2385–2393.
S.S. Sebok-Syer is assistant professor, Department 7 Sirovich BE, Lipner RS, Johnston M,
the methodological ingenuity needed to of Emergency Medicine, Stanford University School
Holmboe ES. The association between
capture both nesting and cross-effects when of Medicine, Palo Alto, California; ORCID: http://
residency training and internists’ ability to
orcid.org/0000-0002-3572-5971.
measuring such performance.35,40 practice conservatively. JAMA Intern Med.
J.L. Swartz is clinical associate professor, Ronald 2014;174:1640–1648.
Conclusions O. Perelman Department of Emergency Medicine, 8 Phillips RL, Petterson SM, Bazemore AW,
NYU Grossman School of Medicine, and director Wingrove P, Puffer JC. The effects of training
This study shows that EHR data for 2 of clinical informatics, Department of Emergency institution practice costs, quality, and other
specific clinical conditions can be used Medicine, NYU Langone Health, New York, characteristics on future practice. Ann Fam
New York. Med. 2017;15:140–148.
to meaningfully assess residents’ clinical
9 Denson JL, McCarty M, Fang Y, Uppal A,
performance in the context of a CBME J.A. Wilhite is senior research coordinator,
Evans L. Increased mortality rates during
Department of Medicine, NYU Langone Health,
program of assessment, and to identify resident handoff periods and the effect of
New York, New York; ORCID: https://orcid.
opportunities for improving health org/0000-0003-4096-8473 ACGME duty hour regulations. Am J Med.
care delivery. Our findings suggest that, 2015;128:994–1000.
in the context of a CBME program of
A. Kalet is professor and Steven and Shelagh Roell 10 Denson JL, Jensen A, Saag HS, et al.
Chair, Robert D. and Patricia P. Kern Institute for Association between end-of-rotation
assessment, meaningful RSQMs should the Transformation of Medical Education, Medical resident transition in care and mortality
include performance differentiating College of Wisconsin, Milwaukee, Wisconsin; ORCID: among hospitalized patients. JAMA.
http://orcid.org/0000-0003-4855-0223.
measures that are criterion based, valid, 2016;316:2204–2213.
and reliable to capture a wider range of S.J. Durning is professor and vice chair, 11 Chahine S, Kulasegaram K, Wright S, et al.
Department of Medicine, and director, Center A call to investigate the relationship between
resident performance in practice. education and health outcomes using big
for Health Professions Education, Uniformed
Services University of the Health Sciences, data. Acad Med. 2018;93:829–832.
Funding/Support: The authors would like to
Bethesda, Maryland; ORCID: http://orcid. 12 Arora VM. Harnessing the power of big data
thank the Edward J. Stemmler, MD Medical org/0000-0001-5223-1597. to improve graduate medical education: Big
Education Research Fund of the National Board idea or bust? Acad Med. 2018;93:833–834.
of Medical Examiners for funding the work of K.M.J.M.H. Lombarts is professor of professional
13 Levin JC, Hron J. Automated reporting of
this collaborative group. performance, Department of Medical Psychology,
trainee metrics using electronic clinical
Amsterdam University Medical Centers, University of
Amsterdam, and Amsterdam Public Health research
systems. J Grad Med Educ. 2017;9:361–365.
Other disclosures: None reported. 14 Smirnova A, Sebok-Syer SS, Chahine S, et al.
institute, Amsterdam, The Netherlands; ORCID:
https://orcid.org/0000-0001-6167-0620. Defining and adopting clinical performance
Ethical approval: Cincinnati Children’s Hospital measures in graduate medical education:
Medical Center (CCHMC) institutional review C.P.M. van der Vleuten is professor Where are we now and where are we going?
board approval was obtained before data of education, Department of Educational Acad Med. 2019;94:671–677.
extraction and analysis. Development and Research. Faculty of Health, 15 Kalet AL, Gillespie CC, Schwartz MD, et
Medicine and Life Sciences, Maastricht University, al. New measures to establish the evidence
Disclaimers: The views expressed herein are those Maastricht, The Netherlands; ORCID: http://orcid. base for medical education: Identifying
of the authors and not necessarily those of the org/0000-0001-6802-3119. educationally sensitive patient outcomes.
U.S. Department of Defense or other federal D.J. Schumacher is professor of pediatrics, Acad Med. 2010;85:844–851.
agencies. Division of Emergency Medicine, Cincinnati 16 Simpson D, Sullivan GM, Artino AR,
Children’s Hospital Medical Center/University of Jr, Deiorio NM, Yarris LM. Envisioning
Previous presentations: Preliminary results of Cincinnati College of Medicine, Cincinnati, Ohio; graduate medical education in 2030. J Grad
this study were presented at the Office of Health ORCID: http://orcid.org/0000-0001-5507-8452. Med Educ. 2020;12:235–240.

374 Academic Medicine, Vol. 98, No. 3 / March 2023


Research Report

17 Schumacher DJ, van der Vleuten CP, one pediatric residency. Acad Med. 34 Sebok-Syer SS, Chahine S, Watling CJ,
Carraccio CL. The future of high-quality 2020;95:1726–1735. Goldszmidt M, Cristancho S, Lingard
care depends on better assessment of 25 Cook DA, Brydges R, Ginsburg S, Hatala L. Considering the interdependence of
physician performance. JAMA Pediatr. R. A contemporary approach to validity clinical performance: Implications for
2016;170:1131–1132. arguments: A practical guide to Kane’s assessment and entrustment. Med Educ.
18 Wyer PC. Assessing resident performance: framework. Med Educ. 2015;49:560–575. 2018;52:970–980.
Do we know what we are evaluating? Ann 26 Mittiga MR, Schwartz HP, Iyer SB, Gonzalez 35 Sebok-Syer SS, Shepherd L, McConnell
Emerg Med. 2019;74:679–681. del Rey JA. Pediatric emergency medicine A, Dukelow AM, Sedran R, Lingard L.
19 Schumacher DJ, Holmboe ES, van der residency experience: Requirements “EMERGing” electronic health record
Vleuten C, Busari JO, Carraccio C. versus reality. J Grad Med Educ. data metrics: Insights and implications for
Developing resident-sensitive quality 2010;2:571–576. assessing residents’ clinical performance
Downloaded from http://journals.lww.com/academicmedicine by BhDMf5ePHKav1zEoum1tQfN4a+kJLhEZgbsIHo4X

measures: A model from pediatric emergency 27 Lancaster GA, Thabane L. Guidelines in emergency medicine. AEM Educ Train.
medicine. Acad Med. 2018;93:1071–1078. for reporting non-randomised pilot and 2021;5:e10501.
Mi0hCywCX1AWnYQp/IlQrHD3i3D0OdRyi7TvSFl4Cf3VC4/OAVpDDa8K2+Ya6H515kE= on 04/18/2024

20 Kinnear B, Kelleher M, Sall D, et al. feasibility studies. Pilot Feasibility Stud. 36 Sebok-Syer SS, Goldszmidt M, Watling CJ,
Development of resident-sensitive 2019;5:114. Chahine S, Venance SL, Lingard L. Using
quality measures for inpatient general 28 Smith KA, Sussman JB, Bernstein SJ, electronic health record data to assess
internal medicine. J Gen Intern Med. Hayward RA. Improving the reliability residents’ clinical performance in the
2021;36:1271–1278. of physician “report cards.” Med Care. workplace: The good, the bad, and the
21 Schumacher DJ, Martini A, Holmboe E, et 2013;51:266–274. unthinkable. Acad Med. 2019;94:853–860.
al. Developing resident-sensitive quality 29 Shwartz M, Restuccia JD, Rosen AK. 37 Hong CS, Atlas SJ, Chang Y, et al.
measures: Engaging stakeholders to inform Composite measures of health care provider Relationship between patient panel
next steps. Acad Pediatr. 2019;19:177–185. performance: A description of approaches. characteristics and primary care physician
22 Schumacher DJ, Martini A, Holmboe E, et al. Milbank Q. 2015;93:788–825. clinical performance rankings. JAMA.
Initial implementation of resident-sensitive 30 Scholle SH, Roski J, Adams JL, et al. 2010;304:1107–1113.
quality measures in the pediatric emergency Benchmarking physician performance: 38 Gebauer S, Steele E. Questions program
department: A wide range of performance. Reliability of individual and composite directors need to answer before using
Acad Med. 2020;95:1248–1255. measures. Am J Manag Care. resident clinical performance data. J Grad
23 Schumacher DJ, Holmboe E, Carraccio C, et 2008;14:833–838. Med Educ. 2016;8:507–509.
al. Resident-sensitive quality measures in the 31 Parkerton PH, Smith DG, Belin TR, Feldbau 39 Sebok-Syer SS, Pack R, Shepherd L, et al.
pediatric emergency department: Exploring GA. Physician performance assessment: Elucidating system-level interdependence in
relationships with supervisor entrustment Nonequivalence of primary care measures. electronic health record data: What are the
and patient acuity and complexity. Acad Med Care. 2003;41:1034–1047. ramifications for trainee assessment? Med
Med. 2020;95:1256–1264. 32 Yudkowsky R, Park YS, Downing SM, eds. Educ. 2020;54:738–747.
24 Schumacher DJ, Martini A, Sobolewski B, Assessment in Health Professions Education. 40 Sebok-Syer SS, Shaw JM, Asghar F, Panza
et al. Use of resident-sensitive quality New York, NY: Routledge; 2019. M, Syer MD, Lingard L. A scoping review of
measure data in entrustment decision 33 Kamata A. Item analysis by the hierarchical approaches for measuring “interdependent”
making: A qualitative study of clinical generalized linear model. J Educ Meas. collaborative performances. Med Educ.
competency committee members at 2001;38:79–93. 2021;55:1123–1130.

Academic Medicine, Vol. 98, No. 3 / March 2023 375

You might also like