Professional Documents
Culture Documents
Development of A Reliable Simulation-Based Test For Diagnostic Abdominal Ultrasound With A Pass/fail Standard Usable For Mastery Learning
Development of A Reliable Simulation-Based Test For Diagnostic Abdominal Ultrasound With A Pass/fail Standard Usable For Mastery Learning
DOI 10.1007/s00330-017-4913-x
ULTRASOUND
Abstract (no false negatives). All intermediate participants and six out
Background This study aimed to develop a test with validity of 14 trainees passed.
evidence for abdominal diagnostic ultrasound with a pass/fail- Conclusion We developed a test for diagnostic abdominal ul-
standard to facilitate mastery learning. trasound with solid validity evidence and a pass/fail-standard
Method The simulator had 150 real-life patient abdominal without any false-positive or false-negative scores.
scans of which 15 cases with 44 findings were selected, Key Points
representing level 1 from The European Federation of • Ultrasound training can benefit from competency-based ed-
Societies for Ultrasound in Medicine and Biology. Four ucation based on reliable tests.
groups of experience levels were constructed: Novices (med- • This simulation-based test can differentiate between compe-
ical students), trainees (first-year radiology residents), inter- tency levels of ultrasound examiners.
mediates (third- to fourth-year radiology residents) and ad- • This test is suitable for competency-based education, e.g.
vanced (physicians with ultrasound fellowship). Participants mastery learning.
were tested in a standardized setup and scored by two blinded • We provide a pass/fail standard without false-negative or
reviewers prior to an item analysis. false-positive scores.
Results The item analysis excluded 14 diagnoses. Both inter-
nal consistency (Cronbach’s alpha 0.96) and inter-rater reli- Keywords Ultrasonography . Abdomen . Simulation
ability (0.99) were good and there were statistically significant training . Education, medical . Radiology
differences (p < 0.001) between all four groups, except the
intermediate and advanced groups (p = 1.0). There was a sta-
tistically significant correlation between experience and test Introduction
scores (Pearson’s r = 0.82, p < 0.001). The pass/fail-standard
failed all novices (no false positives) and passed all advanced Abdominal ultrasound examinations are associated with a risk
for both false-positive and false-negative findings with poten-
tially grave consequences for diagnosis and treatment. While
* Mia L. Østergaard
mlo@dadlnet.dk ultrasound itself is a safe modality, false findings can lead to
additional testing or provide inappropriate reassurance that
can be associated with serious sequelae, such as prolonged
1
Department of Radiology, Copenhagen University Hospital, or more serious illness and anxiety, as well as unnecessary
Rigshospitalet, Blegdamsvej 9, afd. 2023, 2100 Copenhagen
O, Denmark
radiation exposure or invasive procedures [1, 2].
2
The value of an ultrasound examination is dependent upon
Department of Clinical Physiology, Nuclear Medicine and PET,
Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9,
the skills of the examiner. The acquisition of these skills as
2100 Copenhagen, Denmark well as testing and ongoing maintenance should be based
3
Copenhagen Academy for Medical Education and Simulation
upon a structured approach that prioritises objective assess-
CAMES, The Capital Region of Denmark, Blegdamsvej 9, ment of competency [3, 4]. Clinical competence has tradition-
2100 Copenhagen, Denmark ally been achieved with the use of the apprenticeship model
Eur Radiol
(‘see one, do one, teach one’), and educational goals have a mock ultrasound probe. Based on probe positioning, a pre-
been set as a fixed timeframe or a number of procedures per- acquired scan is shown on one screen while the other screen
formed. In modern medicine, this approach has been increas- replicates the buttons on an ultrasound machine and displays
ingly questioned due to several important factors, including any written information. The scans are of real-life patients and
limited resident working hours, fatigue, supervision shortage, each case is constructed on the basis of up to 2,000 raw B-
patient safety concerns, and an increasing focus on the reduc- mode scans [11]. No dynamic movements were simulated and
tion of human errors [5, 6]. As a result, competency-based the Doppler technique was not available.
education has emerged. This approach focuses on continuous The simulators have 150 abdominal cases that were all
competency assessment, as well as the establishment of reviewed by a first-year radiological resident (M.L.Ø.) who
competency-based goals. The latter provides a basis for the selected 50 cases representing a variety of pathologies related
educational approach called mastery learning, which focus on to the liver, gallbladder, bile ducts, pancreas, spleen, urinary
training until a pre-defined competency level is reached. tract and pelvis [12].
Mastery learning ensures that the individual learners all An advanced ultrasound radiologist within the research
achieve the same competence level, but not necessarily at group (M.B.N.) prioritized the 50 cases and, ultimately, agree-
the same pace [7]. ment was reached for a group of 15 cases, including 44 find-
Mastery learning requires a test in order to ensure skill ings that collectively represent the recommended level 1
acquisition at a sufficient level, as well as to help identify knowledge from The European Federation of Societies for
individual training needs and to serve as a well-defined goal. Ultrasound in Medicine and Biology [12] (Table 1). A short
Goalsetting is a crucial factor in skill acquisition alongside patient history and reason for referral were provided for each
motivation, feedback and opportunity for repetition [3]. case. A maximum of 6 minutes was allowed for scanning,
Given that mastery learning is structured around a measure- with no time limit for writing the answers. A pilot test was
ment of competency, it is critical to ensure that the test results completed by a fourth-year radiology resident (K.R.N.) and
can be trusted and represent a true reflection of competency. In minor changes were made based on the resident’s feedback
other words, the test must demonstrate solid validity evidence. and with agreement from all study authors.
The framework by Messick [8] identifies five sources of va- Four study groups were created, each with different levels
lidity evidence: (1) Content: does the test represent the rele- of ultrasound experience and based on the minimum training
vant curriculum? (2) Response process: on what grounds are requirements as defined by the European Federation of
the test results interpreted? (3) Internal structure: is the test Societies for Ultrasound in Medicine and Biology
reliable and generalizable? (4) Relation to other variables: (EFSUMB) [12]:
correlations within the test or to other assessment tools or
measurements. (5) Consequences: who, how and on what 1. Novices: Medical students with little or no ultrasound
grounds does the test-score have an impact [9]. In a systematic experience.
review from 2015, none of the identified studies on 2. Trainees: First-year radiology residents who have com-
simulation-based abdominal ultrasound training demonstrated pleted an introduction to clinical ultrasound, including
a high level of evidence and no tests with validity evidence 4–8 weeks of focused training.
were used [10]. To the best of our knowledge there is no 3. Intermediates: Third- or fourth-year radiology residents
standardized test of competency in abdominal ultrasound with who have completed general ultrasound training, includ-
solid validity evidence. The aim of this study was to develop a ing a minimum of 4–12 months of focused training; cor-
test with validity evidence for abdominal diagnostic ultra- responding to EFSUMB level 1 clinicians.
sound and to establish a pass/fail-standard to facilitate mastery 4. Advanced: Fully specialised radiology physicians who
learning. have completed an ultrasound fellowship and have a min-
imum of 3 years of experience, as well as current employ-
ment at an ultrasound clinic; corresponding to EFSUMB
Material and method level 2 or level 3 clinicians.
The study was approved by The Danish Ethical Committee We estimated a maximum of 17 correct diagnoses in group
with an exemption letter (protocol H-15013261). Test devel- 1 and a minimum of 39 correct diagnoses in group 4 with a
opment was based on two identical simulators manufactured power of 0.9 provided for groups of 14. Data were collected
by Schallware (station 64; version 10013) and provided by the during December 2015–May 2016. All eligible physicians
research fund from the Department of Radiology, and residents in southern and eastern Denmark were recruited
Rigshospitalet. The simulators resemble a diagnostic ultra- by phone, e-mail or in educational groups. Medical students
sound machine and consist of a hard drive, a keyboard, a were recruited through the weekly university paper. All par-
sensor table with a mannequin torso, two touch screens and ticipants provided written informed consent and stated their
Eur Radiol
Table 1 Test information showing diagnostic findings: case number in the test (Case No.) and in the simulator (Case No. in Sim.), diagnostic findings
numbers before (Org. Diag. No.) and after (Final Diag. No.) item analysis, and item difficulty and discrimination number (Item Diss. No.). Diagnostic
findings included in the final test are marked with a grey background
experience level. They were excluded if group criteria were maximum for their group. Participants did not receive any
not met or any additional ultrasound training exceeded the compensation.
Eur Radiol
All participants were given a unique test ID that was (M.B.N.) who identified the five diagnoses as essential curric-
randomly assigned by computer. They were introduced ulum content (Table 1).
to the simulator by one radiological resident (M.L.Ø.) The internal structure of the test was very good, with
using two standardized simulation cases that were not Cronbach’s alpha 0.96 for internal consistency and 0.99 for
included in the test. Written answers to the test ques- inter-rater reliability. The ANOVA (Bonferroni) showed sta-
tions were provided and answer sheets were automati- tistically significant differences (p < 0.001) between test
cally locked away after each case. All tests were scored scores for all four groups, with the exception of those for the
by two blinded reviewers using a list of correct diagno- intermediate and advanced groups (p = 1.0) (Fig. 1.). As
ses. There was a minimum of 0 and a maximum of 44 shown in Fig. 2, a highly statistically significant correlation
points, representing a total of the 44 findings in the 15 was seen between scanning experience in weeks and mean test
cases. Test scores were calculated as the sum of the scores with Pearson’s r = 0.82 (p < 0.001) (Table 2).
correct diagnoses. Incorrect diagnoses were noted sepa- A pass/fail standard was established with a test score of 14
rately. Empty answer boxes were given zero points correct diagnoses based on the mean test scores from the nov-
(Table 1). ice and advanced groups (Fig. 3). Consequences were consis-
An item analysis was performed based on categories by tent with expectations: All novices failed the test (no false
Downing et al., and for each reviewer all the diagnoses positives) and all advanced participants passed (no false neg-
were placed into four categories according to their difficul- atives). All intermediate participants, as well as six out of 14
ty and their ability to discriminate between experience trainees, passed the test.
levels, with category one being the optimal category (high As previously noted, all incorrect diagnoses were regis-
discriminatory level and appropriate difficulty) [8]. tered and analysed separately. A post hoc multiple comparison
According to Downing et al.’s parameters, the decision test (Bonferroni) based on the mean incorrect scores demon-
was made to include all category one, two and three diag- strated a statistically significant difference between novices
noses in the test. Selected category four diagnoses were and intermediates (p = 0.02). However, the findings fell just
individually chosen by an advanced ultrasound radiologist below statistical significance for the comparison between nov-
(M.B.N.) for inclusion in the test. ices and advanced participants (p = 0.053). There were no
Statistical analysis was performed by two authors (L.K. differences between any other groups (p ≥ 0.8). A maximum
and M.L.Ø.) using SPSS version 22 (IBM, Armonk, NY, of ten incorrect diagnoses was set as an additional criterion for
USA). Cronbach’s alpha was calculated using all included passing the test with a subsequent consequence of failing one
diagnoses in order to determine internal consistency, and from additional trainee, but without any consequences for the nov-
the diagnosis scores from both raters in order to determine ice, intermediate or advanced groups.
inter-rater reliability. Relationship to other variables was ex- The participating medical students were all from the
plored by comparing scores (between participants and inter- Copenhagen University and physicians were from ten differ-
rater), as well as incorrect diagnoses from all groups with a ent hospitals and two different private practices in southern
one-way analysis of variance (ANOVA). The latter was done and eastern Denmark.
with corrections for multiple comparisons (Bonferroni).
Correlation between scanning experience in weeks and test
scores was calculated with Pearson’s r.
Consequence was considered with the establishment
of a pass/fail standard. This standard was determined
using the contrasting groups’ method that allows later
adjustment if needed and was based on mean scores and
their associated standard deviations for group 1 and
group 4 [8].
Results
CI confidence interval
Eur Radiol
Conclusion
Limitations
2. Torloni MR, Vedmedovska N, Merialdi M, Betrán AP, Allen T, 12. European Society of Radiology (2015) Guidelines & recommenda-
González R et al (2009) Safety of ultrasonography in pregnancy: tions - Appendix 5. EFSUMB. Available via http://www.efsumb.
WHO systematic review of the literature and meta-analysis. org/guidelines/guidelines01.asp
Ultrasound Obstet Gynecol 33:599–608 13. Thinggaard E, Bjerrum F, Strandbygaard J, Gögenur I, Konge L
3. Anders Ericsson K (2008) Deliberate Practice and Acquisition of (2015) Validity of a cross-specialty test in basic laparoscopic tech-
Expert Performance: A General Overview. Acad Emerg Med 15: niques (TABLT). Br J Surg 102:1106–1113
988–994 14. Thomsen ASS, Kiilgaard JF, Kjaerbo H, la Cour M, Konge L
4. McGaghie WC (2015) Mastery learning: it is time for medical (2015) Simulation-based certification for cataract surgery. Acta
education to join the 21st century. Acad Med J 90:1438–1441 Ophthalmol 93:416–421
5. European Society of Radiology (ESR) (2013) Organisation and 15. Dyre L, Nørgaard LN, Tabor A, Madsen ME, Sørensen JL,
practice of radiological ultrasound in Europe: a survey by the Ringsted C et al (2016) Collecting Validity Evidence for the
ESR Working Group on Ultrasound. Insights Imaging 4:401–407 Assessment of Mastery Learning in Simulation-Based Ultrasound
6. Garg M, Drolet BC, Tammaro D, Fischer SA (2014) Resident Duty Training. Ultraschall Med 37:386–392
Hours: A Survey of Internal Medicine Program Directors. J Gen 16. Jacobsen ME, Andersen MJ, Hansen CO, Konge L (2015) Testing
Intern Med 29:1349–1354 basic competency in knee arthroscopy using a virtual reality simu-
7. McGaghie WC, Miller GE, Sajid AW, Telder TV (1978) lator: exploring validity and reliability. J Bone Joint Surg Am 97:
Competency-based curriculum development on medical education: 775–781
an introduction. Public Health Pap 68:11–91
17. Konge L, Clementsen PF, Ringsted C, Minddal V, Larsen KR,
8. Downing SM, Yudkowsky R (2009) Assessment in health profes-
Annema JT (2015) Simulator training for endobronchial ultra-
sions education, 1st edn. Routledge, New York, p 108 and p 143
sound: a randomised controlled trial. Eur Respir J 46:1140–1149
9. Ghaderi I, Manji F, Park YS, Juul D, Ott M, Harris I et al (2015)
Technical skills assessment toolbox: a review using the unitary 18. Konge L, Albrecht-Beste E, Nielsen MB (2014) Virtual-reality sim-
framework of validity. Ann Surg 261:251–262 ulation-based training in ultrasound. Ultraschall Med 35:95–97
10. Østergaard M, Ewertsen C, Konge L, Albrecht-Beste E, Bachmann 19. Ericsson KA (2008) Deliberate practice and acquisition of expert
Nielsen M (2016) Simulation-Based Abdominal Ultrasound performance: a general overview. Acad Emerg Med Off J Soc Acad
Training – A Systematic Review. Ultraschall Med 37:253–261 Emerg Med 15:988–994
11. Schallware (2016) Specifications of simulator. Schallware, 20. Bransford JD, Schwartz DL (1999) Rethinking Transfer: A Simple
Germany. Available via http://www.schallware.com/ Proposal with Multiple Implications. Rev Res Educ 24:61