Professional Documents
Culture Documents
2022 in Silico Review
2022 in Silico Review
2022 in Silico Review
net/publication/370551251
CITATIONS READS
0 171
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Mathias Unberath on 07 May 2023.
March 2023
1. Introduction
are not routinely recorded nor stored further complicates the curation of large
data sets that are needed for training machine learning algorithms, including deep
convolutional neural networks, transformer architectures, which are the backbone of
modern intelligent systems.
In silico training circumvents these obstacles through the use of computer
simulation as an ethical, controlled, and scalable source of data for intelligent systems
in MIS. This approach is already prominent in related problems of general robotic
manipulation [41, 42] and self-driving [43], each of which contends with challenges
specific to the given task, environment, and data modality. In the context of
MIS, numerous simulation frameworks have been developed by researchers either
with machine learning specifically in mind [44–46] or as training tools for clinicians
and subsequently adapted for machine learning applications [29, 37]. One of the
most mature examples is the SimNow software suite, developed for the da Vinci
Surgical Robot, which includes training routines for 33 surgical skills [47], as well
as complete procedures (Intuitive Surgical Operations, Inc.; Sunnyvale, California).
The increasing maturity of this and other simulation frameworks for MIS has
heightened interest in in silico training for intelligent systems as well as clinicians.
In this article, we review the progress in and challenges facing in silico training of
intelligent MIS systems. Given the differences in application, reviews of simulation
training for general robotics, such as [48], focus on the use of RGB(-D) imaging,
which generally does not apply in the context of MIS. Further, although previous
reviews include recent advances in MIS [49–60], robotic-assisted MIS [56, 61–64],
machine learning in surgical interventions [34, 35, 65–73], or surgical simulation for
human training purposes [74–77], in silico training specifically for intelligent MIS
systems remains an emerging area deserving of an introduction. We focus this review
on frameworks and successful applications in three imaging modalities which have
received the bulk of researchers’ attention, namely endoscopy, ultrasound, and X-ray.
For each simulation framework, we review the image quality in terms of sim-to-real
transfer, simulation dynamism, computational resources, and time cost.
The outline of the review is as follows. In Section 2, we introduce overarching
concepts relevant to simulation. Diving into recent progress, Sections 3, 4, 5, and
6 explore the various simulation frameworks and their applications for training
intelligent systems in MIS, for endoscopic, stereo microscopic, ultrasound (US),
computed tomography (CT), and X-ray imaging, respectively. Table 1 summarizes
the available simulation frameworks for each modality. We first introduce frameworks
In Silico Simulation: A Key Enabling Technology 4
which have received the most attention and subsequently highlight recent alternatives
that may be of interest, focusing on these modalities because they have received
the most attention from the community for in silico training. In Section 7, we
call attention to the capabilities of systems developed for different modalities and
speculate on future directions in this exciting area.
Image plane
Image plane
Pixel
Pixel
Sampled points
Intersection
Array volume
Ray Sparse triangle mesh
Ray
Figure 1: Simulation methods for 2D image formation generally adhere to two major
approaches based on the data structure: (a) volumetric simulation and (b) sparse
simulation, which may use point clouds, surface meshes (as shown here), spline-based
object models, or other spatial data structure. In (a), dense grid data makes up a
volume, which individual rays “march” through to model energy-matter interactions.
In (b), a sparse triangle mesh models the surface of an object, which one or more
rays intersect at a finite set of points. The latter approach is commonly used in
computer graphics, with open source tools such as Blender [81] available for creating
3D scenes with complicated meshes, or they can be sampled from statistical shape
models [82].
in the optical domain. In Fig. 1a, the computation of a given pixel value involves
“marching” along the corresponding ray and processing the contributions at each
point in the volume [83]. When considering interaction effects, like scattering, this
technique can result in photo-realistic rendering of complicated scenes, but until
recently it was computationally prohibitive for generating large datasets. The use
of increasingly powerful GPU devices has made volumetric methods more feasible
for a variety of applications. Fig. 1b, by contrast, considers the point where the ray
intersects with a surface mesh, a more efficient problem in terms of computational
time and memory [84].
The benefits of sparse simulation methods primarily stem from the increased
computational efficiency, which can enable simulations with more capable dynamic
interactions, while volumetric methods contain more detail for rendering realistic
In Silico Simulation: A Key Enabling Technology 6
images. A sparse patient model, for example, consists of enclosed meshes segmenting
each organ, as in [82]. This enables fast simulation of physical interactions, such
as collision detection and realistic tissue deformation, and for visible light imaging
modalities, in which reflective properties of surfaces are responsible for a large portion
of image formation, can provide the basis for photorealistic image rendering [81].
Simulating transmissive imaging modalities, however, such as X-ray and US, requires
modeling the energy-matter interactions throughout a medium, including the interior
of surface meshes. For sparse models, this can be accomplished by modeling the
distribution of attenuation properties within each region, e.g. by assuming uniform
density and material composition or using splines [85], but volumetric patient models
are inherently better suited for this task because they contain dense material property
data. Thus for transmissive modalities, volumetric models (with sufficient spatial
resolution) enable more realistic simulation of image formation, while sparse methods
are sufficient for realistic rendering of visible light images and support more powerful
simulation dynamics.
The quality of imaging simulation directly affects the sim-to-real performance
of the system, that is its ability to train in silico but perform on the corresponding
domain of real images. In machine learning, this problem is referred to as domain
adaptation. One domain adaptation approach, which we find to be commonly used
in the papers described here, is the use of generative adversarial networks (GANs) to
translate images from the simulation domain to the real, using an unsupervised cycle-
consistent loss [86]. Separately, it should be noted that the sim-to-real capability
of a model differs from its generalizability. The former is a matter of image
appearance and tool-to-tissue interaction (if considered), while the latter describes an
intelligent system’s performance on samples and situations not seen during training,
either in the simulation or real domain. As an example, a bone segmentation
algorithm may achieve high sim-to-real performance but still fail when confronted
with anamalous anatomy or fractures, regardless of the domain. The differences in
imaging parameters and techniques between institutions also introduces the need for
generalizability [87]. In general, it is understood that training a generalizable model
requires sufficient variation in the simulation parameters, including image formation
characteristics as well as patient demographics, pathologies, and anatomical features.
In order to facilitate in silico training, it is necessary to obtain ground truth
annotations of the image. For 2D imaging modalities, this often involves propagating
information from the 3D simulation, where positions and orientations for objects of
In Silico Simulation: A Key Enabling Technology 7
Image plane
ũ
Camera intrinsics
K
x̃
Camera
World
Camera extrinsics [R|t]
Figure 2: Pinhole camera geometry often used for modeling endoscopic, stereoscopic,
and X-ray imaging. The camera projection matrix P = K[R|t] can be used to
propagate 3D annotations, like the landmark x̃ to the image plane.
interest are known, to the 2D image being simulated. For example, in endoscopic,
stereoscopic, and X-ray imaging, the intrinsic parameters of the camera geometry—
its focal length and pixel spacing—can be modeled in the intrinsic matrix K ∈ R3×3 ,
while the position and orientation of the camera are encoded in the extrinsic matrix
[R|t]. Thus for a homogeneous 3D point x̃ = [x, y, z, 1] in the simulated world, such
as an anatomical landmark, the corresponding homogenous point in the image can
be determined as
ũ = K [R|t] x̃, (1)
as in Fig. 2. Similar operations can determine image-space projections for slice-
to-volume transformations, as they arise in ultrasound, and extend to different
annotations such as lines, volumes, and 3D segmentations, enabling richly annotated,
large-scale datasets to be generated from relatively few annotations of the underlying
3D data.
10
In Silico Simulation: A Key Enabling Technology 11
support robots with closed-link kinematic chains and redundant mechanisms, which
are common in surgical robotics but not supported by popular robotics frameworks.
In addition, the simulator supports non-rigid bodies, such as soft tissue, and
built-in models of real-world surgical robots. These properties have supported
immersive simulation of lateral skull-based surgery for simultaneous training and
data generation, suitable for training machine learning models [46].
Building upon the AMBF, Munawar et al. [99] propose a unified simulation
platform for robotic surgery with the da Vinci robot to facilitate the prototyping
of various scenes, including an example suturing task. Their contributions enable
real-time control and detection collision within the simulation, as well as generation
of ground truth data for machine learning, enabling human-driven simulation to
provide kinematic as well as image data over extended sequences.
To support simultaneous human training and in silico collection of data
for machine learning applications, Munawar et al. [28] further extends the
aforementioned functionality to include haptic feedback for lateral skull-based
surgery as well as a simulated stereo microscope to enable depth perception in the
operating field. They demonstrate the utility of this approach for the purposes of
training intelligent systems by training a stereo depth network (STTR [126]) on the
simulation images.
Varier et al. [100] propose AMBF-RL to support real-time reinforcement learning
(RL) for surgical robotic tasks. They validate their approach using in silico training
of an RL agent for debris removal with the dVRK Patient Side Manipulator (PSM)
and successfully transfer the optimal policy to a real robot.
3.2. Blender
Cartucho et al. [98] introduce VisionBlender, a software plugin allows users to create
highly realistic computer vision datasets with segmentation maps, depth maps, and
other ground truth, using the 3D modeling software, Blender [81], as a backbone.
VisionBlender is designed with robotic surgical applications specifically in mind,
supporting playback through the ROS platform to simulate real-time data collection
from a da Vinci robot via dVRK. [127] utilize VisionBlender [98] to generate a
simulated laparoscopic dataset for a performance evaluation of their improved marker
in surgical tool tracking.
Chen et al. [25] uses Blender [81] to simulate smoke in endoscopic video, such
In Silico Simulation: A Key Enabling Technology 13
data collection while reporting similar challenges, namely the lack of tool-to-tissue
interaction through the re-play paradigm.
Wu et al. [136] introduce a soft tissue simulator that is unified with the dVRK
framework and uses SOFA [137] framework as the physics simulator. With robotic
vision and kinematic data, they train a network to learn correction factors for finite
element method (FEM) simulations with the discrepancy of simulations and real
observations. In their follow-on work [138], the authors present a faster approach,
where they implement a step-wise framework in the network for interactive soft-tissue
simulation and real-time observations.
Several datasets are available for visible light imaging modalities, based on in
silico simulation, although the simulation framework itself may not be available.
The Surgical Visual Domain Challenge dataset had participants transfer skills from
simulated da Vinci surgery (based on the SimNow simulator) to real endoscopic video
[139]. It is available as the “SurgVisDom dataset.” Madapana et al. [140] investigate
sim-to-real skill transfer in the context of dextrous surgical skills, simulating a dataset
for the aforementioned peg transfer task and collecting corresponding instances on
real robots, namely the Taurus II and YuMi. Their DESK dataset is publicly
available. Rahman et al. [141] use simulated OpenAI Gym environment and real-
world DESK dataset to evaluate robotic activity classification methods.
4. Ultrasound Imaging
models of wave propagation. They can rely on sparse representations of data, with
finite point sources at known locations, or volumetric patient data based on CT or
MRI. In the latter case, it is necessary to process the original image, which measures
either X-ray attenuation in the case of CT or atomic spin in MRI, to infer tissue
elasticity and reflectance properties relevant to US. If an organ segmentation is
available, the ultrasound absorption and attenuation can be estimated separately
for each tissue type based on values in the literature [152]. A distinctive trait of US
images is speckle, which arises from micro-inhomogeneities in tissue [153]. Although
such structure is smaller than the spatial resolution of CT or MRI data, it can likewise
be modeled based on work done for tissue characterization purposes [154, 155].
In general, the US community prefers MATLAB toolkits, although Python
wrappers exist for several frameworks (Python being preferred by the machine
learning community due to [156, 157]). Below, we review these tools and their
applications for image-guided minimally invasive surgery.
4.1. Field II
Originally developed starting in 1991, Field [101–103] represents one of the early
complete solutions for computational simulation of ultrasound image formation.
Based on the Tupholme-Stepanishen method [158–160], Field can simulate any kind
of transducer geometry and excitation, with fast computation made possible by a far-
field approximation. In 1997, Field II [161] parallelizes execution to reduce simulation
time significantly, eventually enabling simulation of 128 images in 42 seconds with
full precision (0.33 seconds per image) in 2014 [162]. As of December 2022, Field II
continues to enjoy a wide user base and regular software updates, compatible with
modern processors and major operating systems (Windows, Mac OS, and Linux).
Parallel execution and a native Python implementation are available for commercial
use as a part of Field II Pro.‡
The fast inference time of deep neural networks (DNNs) means that real-time
applications can improve image quality in the operating room. Hyun et al. [163]
use Field II to simulate a training set for the purpose of speckle reduction in B-
mode images, introducing log-domain loss functions tailored for US. They showed
that speckle reduction using their DNN trained in silico outperformed the traditional
delay-and-sum and nonlocal means method on real images, in terms of preservation
‡ http://field-ii.dk/.
In Silico Simulation: A Key Enabling Technology 16
4.2. k-Wave
One way to reduce the computation necessary for US imaging is the k-space
pseudospectral method, which discretizes the equations modeling nonlinear wave
propagation [170]. The open-source tool k-Wave [105] is a MATLAB toolbox,
capable of simulating physically realistic images quickly, using a planar detector
geometry based on the fast Fourier transform. Unlike other tools, k-Wave
improves computational speed through parallel execution on graphics processing
units, simulating 1000 time-steps for a grid size of 1283 in 7 minutes (approximately
0.4s per timestep), more than 2.5 times speedup compared to multi-threaded CPU
execution.
US image guidance often requires visualization of point-like targets, such as
needle cross sections or catheters, which can be confused with point-like artifacts
commonly present in the surgical setting. Allman et al. [171] show the advantage
of DNNs in this area, which are able to distinguish true point sources with 96.67%
accuracy in phantom, after training in silico with k-Wave [105]. They achieved sub-
millimeter point source localization error (0.38 ± 0.25 mm), enabling visualization of
a novel artifact-free image in the context of MIS. A similar approach uses a simulated
used k-Wave to precisely localize point target vessel structures for guidance in MIS
[172].
The goal of distinguishing point sources with high accuracy may benefit from
photoacoustic imaging, where the absorption of optical or radio-frequency EM
excitations cause tissue to generate acoustic waves [173]. Combining in silico and in
vivo data for training, [174, 175] explore the use of light-emitting diodes (LEDs)
as excitation sources for photoacoustic visualization of clinical needles in MIS,
developing a DNN-based system to enhance the visualization, achieving 4.3-times
higher signal-to-noise ratio compared to conventional reconstructions. Their semi-
synthetic approach allows for complete knowledge of the desired ground truth while
reducing the sim-to-real gap.
4.3. FOCUS
The Fast Object-Oriented C++ Ultrasound Simulator (FOCUS) is a fast US
simulator available for MATLAB [106]. It resolves large errors in ultrasound
In Silico Simulation: A Key Enabling Technology 18
simulation in the nearfield and at the transducer face using the fast nearfield
method [176] and time-space decomposition [177]. Comparing the simulations of
FOCUS and Field II reveals this difference for a sampling frequency as low as 25
MHz, where the impulse response calculation in Field II introduces aliasing artifacts
[178].
Because US images typically contain only 2D data from a linear array, resolving
3D pose based on the work in the previous section can be nontrivial. Arjas
et al. [179] propose a solution based on in silico training combined with a Kalman
filter to improve localization over continuous US acquisition, as is common in the
surgical setting. They train a DNN based on the FOCUS simulator [106], although
applications to inhomogeneous tissue would require simulation with the k-Wave
framework. Nevertheless, their approach shows promise for the critical task of
reconstructing 3D tool poses from 2D US images, achieving 0.3 mm maximum error
when transferring to real US images on submerged needles in water.
Although an open source framework for this approach is not available, their method
is straightforward enough to be re-implemented with standard tools. Moreover, they
demonstrate that this approach, combined with image augmentation, achieved a
15.6% higher DICE score on real US images than networks trained with Field II,
while simulation of these images was 36,000 times faster, leveraging the fast Fourier
transform [184]. However, like Garcia-Peraza-Herrera et al. [133], this method only
simulates the added objects; complete ground truth for the underlying real images
are not available.
Rather than compute 2D US images from digital phantoms, Li et al. [27] propose
to use volumetric ultrasound acquisitions to sample slices for in silico training of an
autonomous system for achieving standard US views in spinal sonography. This
approach is efficient and results in highly realistic images, which can be sampled
dynamically from many viewpoints, although real patient scans are required. They
demonstrate that this in silico environment is suitable for DRL, achieving standard
In Silico Simulation: A Key Enabling Technology 20
views of the spine to within reasonable margins (5.18 mm / 5.25°in the intra-subject
setting).
The Verasonics Vantage ultrasound scanners are a US hardware family well-
suited to US research, due to the easy access to raw ultrasound data. Along with this
hardware, Verasonics supplies an US simulator that is integrated with their imaging
platform, making it an attractive research tool that could streamline translation
onto real devices, similar to PLUS [107, 110]. Although no theoretical work
exists describing their proprietary software, information regarding their approach
is available on their website.§
Recently, in 2022, [109, 110] propose SIMUS, an open-source ultrasound
simulator for MATLAB, as a part of the MATLAB ultrasound toolbox (MUST)
[185]. SIMUS simulates the acoustic pressure fields, performing comparably to Field
II, k-Wave, FOCUS, and Verasonics in terms of image realism. The simplicity and
online availability of this framework makes it useful for pedagogical purposes as well
as research.
Finally, [186] demonstrate the sim-to-real capabilities of a robotic US imaging
system, learning based on RGB images and sensor readings rather than the US
images themselves. This approach reduces the simulation problem to one that is more
aligned with general robotic manipulation, although it will be an essential component
of more complete simulation of an intelligent MIS system. A full simulation of the
operating room as a dynamic in silico environment, i.e. a digital twin, will require
simulation of the given medical imaging modality as well as the non-medical data
like RGB cameras and force sensors.
§ https://verasonics.com/ultrasound-simulator/
In Silico Simulation: A Key Enabling Technology 21
5. X-ray Imaging
5.1. CONRAD
The CONRAD framework is an early framework for simulating realistic radiographs,
providing the first unified library for cone-beam imaging that incorporates nonlinear
physical effects and projective geometry, written in Java and accelerated with
OpenCL [85]. Primarily intended for applications in CT reconstruction, CONRAD
relies on a spline-based object model to represent the attenuation properties of
In Silico Simulation: A Key Enabling Technology 22
patients and tools in space. This sparse data representation results in images that
are highly controllable but may lack the realism needed for sim-to-real transfer in
the X-ray domain. Nevertheless, in 2018 CONRAD was subsequently adapted for
in silico training of intelligent systems in cardiac statistical shape modeling [203],
digital subtraction angiography (DSA) [204], and motion reduction in diagnostic knee
imaging [205]. DSA refers to the acquisition of subsequent fluoroscopic images with
and without contrast agent, the subtraction of which yields a background-free image
focusing on the blood vessel [206]. Virtual, single-frame DSA removes the need for a
non-contrast acquisition by segmenting the foreground vessel from the background,
which Unberath et al. [204] accomplish automatically with a U-Net trained on
CONRAD. Finally, although Bier et al. [205] considers diagnostic applications,
specifically compensating for patient motion during load-bearing acquisition of
CBCT, their proposal to automatically detect anatomical landmarks from projective
images, based on in silico training, has since gained ground in MIS applications.
5.2. MC-GPU
Monte Carlo simulation enables realistic, physics-based simulation of radiation
transport, but requires significant compute time. Developed with the aim of
accelerating simulation of X-ray and CT images, MC-GPU features massively parallel
Monte Carlo simulation of photon interactions with volumetric patient models,
leveraging advancements in graphical processing units (GPUs) [111, 207]. Compared
to single-core CPU execution, MC-GPU achieves a maximum 27-fold speedup in
simulation time, using hardware available in 2009. Despite these advantages, MC-
GPU still requires sufficient compute time to be impractical for generating datasets
on the scale needed for training deep learning algorithms.
5.3. DeepDRR
To catalyze in silico training in the X-ray domain, Unberath et al. [44, 112] contribute
a Python framework for fast, physics-based DRR synthesis with sufficient realism
for sim-to-real transfer. While previous approaches that focused on image realism
employed Monte Carlo simulation of photon absorption and scatter, DeepDRR
approximates these effects in a physically realistic manner by projecting through
segmented CT volumes and subsequently estimating photon scatter with a DNN,
trained on Monte Carlo ground truth, which enables generation of tens of thousands
In Silico Simulation: A Key Enabling Technology 23
from the same view with and without metal implants in the spine, enabling a
DNN to inpaint radiographs and remove metal implants. This can improve 2D/3D
registration of the spine, wherein the sharp contrast of metal implants tends to inhibit
intensity-based registration methods.
Building on physics-based DRR synthesis, Toth et al. [216] show that domain
randomization (DR) can improve DRR-to-X-ray performance for cardiac 2D/3D
registration. They train an RL agent to iteratively update a cardiac model to align
with real radiographs, demonstrating higher stability when DR is applied durign
training. The purpose of DR, which applies unrealistic image transformations to
the training set, is to apply such large variation that the DNN avoids local, domain-
specific minima in the loss function. Previously mentioned work in the pelvis likewise
uses have similarly utilized DR for fully automatic 2D/3D registration in the pelvis,
based on anatomical landmark detection [210].
The advantages of domain randomization are precisely proved in Gao et al. [217],
which shows that the combination of physics-based X-ray synthesis, using DeepDRR,
combined with strong DR are comparable to GAN-based domain adaptation, and
they outperform GAN-based domain adaptation with conventional DRRs, although
this work is not yet peer-reviewed. This is advantageous because the image
transformations involved in “strong DR,” such as image inversion, blurring, warping,
and coarse dropout, among others, are computationally inexpensive, whereas GANs
require additional training with sufficient real images as an unlabeled reference.
They demonstrate this approach, coined “SyntheX,” on three representative tasks:
pelvic landmark detection for 2D/3D registration, detection and segmentation of a
continuum manipulator, and COVID-19 diagnosis from chest X-rays [217].
Inspired by [205], Kausch et al. [18] propose an intelligent system for automatic
C-arm repositioning, regressing a pose update based on intermediate landmark
detection to obtain standard views of the pelvis. Without additional fine-tuning,
their system obtained desired clinical views when evaluated on real X-rays, to within
inter-rater variability [18]. They refine this work in the domain of spine surgery,
introducing K-wire and pedicle screw implants as augmentations to the training set
[218]. K-wire simulation was accomplished through post-processing of the X-ray
image with quadratic Bézier curves, while pedicle screws were simulated in the DRR
synthesis process similar to Unberath et al. [112]. Related work in this area uses a
DNN to estimate the C-arm pose in simulation [219].
Recent applications of in silico training with DeepDRR continue to focus on
In Silico Simulation: A Key Enabling Technology 25
is learned based on real X-rays. As in RealDRR and XPGAN [227], they use a GAN
[86] to improve image realism as a final step. Although NeRPs are only used for
in silico training of a diagnostic intelligent system, this simulation framework has
potential applications in MIS, where novel view synthesis is a desirable capability. A
framework for their approach is not yet available.
To minimize radiation dose during CT imaging, Abadi et al. [228] propose a
volumetric, Monte Carlo simulation framework with device- and institution-specific
parameters for imaging, called DukeSim. Although intended for CT simulation,
DukeSim’s indidividual projections have been used in combination with voxelized
statistical shape models [82] to reduce variability in X-ray imaging due to exactly
these factors, which can affect the consistency and reliability of diagnostic imaging
[229].
Finally, although it is not specifically developed with machine learning in
mind, DRRGenerator [113] may yet be of interest to the community because of
its intuitive user interface and integration with the open-source medical imaging
software, 3D Slicer [108]. Currently, the popular DeepDRR tool requires users to
develop a sampling strategy of sufficiently varied views in order to guarantee view-
invariant DNN performance, as in [209]. With additional capabilities focused on
in silico training, DRRGenerator would be to X-ray image-guided interventions as
VisionBlender and AMBF+ are to endoscopic and stereo microscopic image guided
procedures, respectively.
Much of the interest in in silico training for MIS has focused on 2D imaging
modalities, endoscopy, US, and X-ray. We speculate that this is because large
datasets suitable for training DNNs are easier to obtain in the 2D domain, where
a single annotation of a 3D image can be propogated to thousands of samples.
Nevertheless, simulating 3D images and 3D physical interactions is of interest to
develop intelligent systems focused on intra-operative CT, MRI, and 3D US.
Toward this end, Lee et al. [230] propose a simulated environment for training
autonomous needle insertion robots, using RL. They model the deformation of a
beveled tip needle in a dynamic environment based on stochastic processes, providing
negative rewards for collisions with obstacles such as bone, and positive rewards
when it reachs the biopsy target. In the future, physics-based simulation of needle
In Silico Simulation: A Key Enabling Technology 27
insertion may provide simulation for both CT-guided or, through a platform like
DeepDRR [44], X-ray guided needle insertion.
CT guidance must contend with severe image artifacts introduced by metal
implants, such as pedicle screws. In order to minimize metal artifacts in intra-
operative cone-beam CT (CBCT), Zaech et al. [231], Thies et al. [232] train a DNN
to adjust C-arm trajectories during image acquisition, using DeepDRR for in silico
training. Their method iteratively adjusts the out-of-plane angle of a robotic C-arm
to avoid “poor images” characterized by beam hardening, photon starvation, and
noise.
Recently, image-free guidance for MIS has been explored. Årsvold et al.
[233] simulate electrical properties of target tissue types in order to train an
intelligent system for minimally invasive lymphadenecomy, a surgical treatment for
cancer. Their determines whether a lymph node is present underneath an electrical
impedance scanner, which can be deployed as part of a robotic assisted MIS system,
achieving 93.49% accuracy in ex vivo tissue phantom experiments.
There is significant potential for in silico training to produce the next generation
of MIS systems, deploying artificial intelligence to assist providers in alleviating the
inherent challenges of minimally invasive approaches, in particular by improving the
acquisition and interpretation of intra-operative images. In silico simulation provides
a training ground limited only by the constraints of generating realistic-looking
images and the existing techniques therefor. In principle, as data representations
and physics-based simulations continue to mature, in silico training can expand
to include near limitless experience for learning-based algorithms, from supervised
learning to reinforcement learning, by providing rich annotations from the perfectly
controlled virtual environment.
One notable constraint on in silico simulation for MIS is the availability of
patient data on which to base simulated images and interactions. Much of this review
has focused on 2D imaging modalities for exactly this reason, since the generation
of endoscopic, X-ray, or ultrasound images allows for hundreds or thousands of
training samples to originate from a single patient model [44, 98, 162]. For example,
DRRs vary widely in visual appearance based on the position and orientation of
the virtual C-arm, and further techniques such as domain randomization increase
In Silico Simulation: A Key Enabling Technology 28
the variance of training data to further improve sim-to-real transfer [210, 216, 217].
However, existing techniques for generating these realistic-looking images rely on
3D patient models based on patient data, including CT, MRI, or prior endoscopic
reconstructions, for example. This introduces a potential bottleneck to in silico
training, where the long tailed distribution of real-world situations presents too much
variation for large but still finite data to train models with sufficient generalizability.
Moreover, simulation of 3D intra-operative images, such as CT, MRI, and 3D US,
must rely on existing digital phantoms such as XCAT [82], which tend to produce
images with lower variation and realism.
As previously discussed, generalizability differs from sim-to-real domain
adaptation in that it is concerned not just with image appearance or tissue
characteristics, for example, but with any number of variations that may arise
in the course of surgery, where anomalous anatomy and complications produce
images outside the training domain. Imaging techniques, surgical setups, and patient
demographics vary significantly from one institution to another so it is challenging
to train AI models which are resilient to such variations [87, 234]. The reliance on
finite patient data underlying current in silico simulation methods implies that not
all patient variation can be represented, and unseen conditions, such as the presence
of foreign surgical instruments from a manufacturer not considered during training,
may result in deteriorated performance.
An integrated physics engine is a crucial way that in silico training platforms
can increase the utility of each patient model, introducing variation based on tissue
deformation and tool-to-tissue physical interaction. In the visible light domain, a
great deal of attention has focused on simulation for robotic laparoscopy, including
complex interactions like suturing [99] and camera manipulation [29]. These rely
either on hand-built virtual environments or patient-based reconstructions, which
enable more realistic image formation [98]. In order to realize the full potential of in
silico training beyond the visible light domain, there is a need to develop simulation
frameworks for image formation of CT, MRI, X-ray, and ultrasound based on sparse
data representations that are conducive to physics engine modeling. Leveraging the
existing physics and robotic modeling capabilities of simulators like AMBF [45] will
require developing rendering frameworks like DeepDRR [44] (X-ray) and Field II
[101] (ultrasound) that produce images capable of overcoming the sim-to-real gap
for each modality, although these existing frameworks for non-optical simulation
rely on volumetric methods to produce realistic images. Since converting between
REFERENCES 29
Disclosures
This work was supported by the NIH under Grant No. R21EB028505 and Johns
Hopkins University Internal Funds.
References
Clin. N. Am., vol. 8, no. 4, pp. 891–906, Nov. 1998. [Online]. Available:
https://pubmed.ncbi.nlm.nih.gov/9917931
[3] M. Wong, S. Morris, K. Wang, and K. Simpson, “Managing Postoperative
Pain After Minimally Invasive Gynecologic Surgery in the Era of the Opioid
Epidemic,” J. Minim. Invasive Gynecol., vol. 25, no. 7, pp. 1165–1178, Nov.
2018.
[4] K. Mohiuddin and S. J. Swanson, “Maximizing the benefit of minimally
invasive surgery,” J. Surg. Oncol., vol. 108, no. 5, pp. 315–319, Oct. 2013.
[5] R. E. Goldstein, L. Blevins, D. Delbeke, and W. H. Martin, “Effect of
Minimally Invasive Radioguided Parathyroidectomy on Efficacy, Length of
Stay, and Costs in the Management of Primary Hyperparathyroidism,” Ann.
Surg., vol. 231, no. 5, p. 732, May 2000.
[6] T. Tarin, A. Feifer, S. Kimm, L. Chen, D. Sjoberg, J. Coleman, and P. Russo,
“Impact of a Common Clinical Pathway on Length of Hospital Stay in Patients
Undergoing Open and Minimally Invasive Kidney Surgery,” J. Urol., vol. 191,
no. 5, pp. 1225–1230, May 2014.
[7] T. Cheng, T. Liu, G. Zhang, X. Peng, and X. Zhang, “Does minimally
invasive surgery improve short-term recovery in total knee arthroplasty?” Clin.
Orthop., vol. 468, no. 6, pp. 1635–1648, Jun. 2010.
[8] G. M. Jonsdottir, S. Jorgensen, S. L. Cohen, K. N. Wright, N. T. Shah,
N. Chavan, and J. I. Einarsson, “Increasing Minimally Invasive Hysterectomy:
Effect on Cost and Complications,” Obstet. Gynecol., vol. 117, no. 5, pp. 1142–
1149, May 2011.
[9] M. Gatz, A. Driessen, J. Eschweiler, M. Tingart, and F. Migliorini, “Open
versus minimally-invasive surgery for Achilles tendon rupture: a meta-analysis
study,” Arch. Orthop. Trauma Surg., vol. 141, no. 3, pp. 383–401, Mar. 2021.
[10] N. Ahmidi, G. D. Hager, L. Ishii, G. L. Gallia, and M. Ishii, “Robotic Path
Planning for Surgeon Skill Evaluation in Minimally-Invasive Sinus Surgery,”
in Medical Image Computing and Computer-Assisted Intervention – MICCAI
2012. Berlin, Germany: Springer, 2012, pp. 471–478.
[11] R. C. Setliff, “Minimally Invasive Sinus Surgery: The Rationale and the
Technique,” Otolaryngol. Clin. North Am., vol. 29, no. 1, pp. 115–129, Feb.
1996.
REFERENCES 31
From_Fluoroscopy_During.41.aspx?casa_token=hG4x0Phiyd4AAAAA:
1DoMNmVpEvw0I2nbtfolpOHuD4jJd7qHKnqCSrsLeUYEd-TwZN3kYhYDT6bRw94I18Vg77pw
[21] N. Theocharopoulos, K. Perisinakis, J. Damilakis, G. Papadokostakis,
A. Hadjipavlou, and N. Gourtsoyiannis, “Occupational Exposure from
Common Fluoroscopic Projections Used in Orthopaedic Surgery,” JBJS,
vol. 85, no. 9, pp. 1698–1703, Sep. 2003. [Online]. Available: https://
journals.lww.com/jbjsjournal/fulltext/2003/09000/occupational_exposure_
from_common_fluoroscopic.7.aspx?casa_token=Vsl1Btk05L0AAAAA:
-PhS9duGvTpxiknHIo80d3kFSMSSrk3qmUK8yWCW8OKhaMPfYMvUbA9G3mW8mDozYv8B
[22] J. L. Cook, J. L. Tomlinson, and A. L. Reed, “Fluoroscopically Guided Closed
Reduction and Internal Fixation of Fractures of the Lateral Portion of the
Humeral Condyle: Prospective Clinical Study of the Technique and Results in
Ten Dogs,” Vet. Surg., vol. 28, no. 5, pp. 315–321, Sep. 1999.
[23] J. Sándor, B. Lengyel, T. Haidegger, G. Saftics, G. Papp, Á. Nagy, and
G. Wéber, “Minimally invasive surgical technologies: Challenges in education
and training,” Asian Journal of Endoscopic Surgery, vol. 3, no. 3, pp. 101–108,
Aug. 2010.
[24] R. H. Taylor, N. Simaan, A. Menciassi, and G.-Z. Yang, “Surgical robotics and
computer-integrated interventional medicine,” Proceedings of the IEEE, vol.
110, no. 7, pp. 823–834, 2022.
[25] L. Chen, W. Tang, N. W. John, T. R. Wan, and J. J. Zhang, “De-smokeGCN:
Generative Cooperative Networks for Joint Surgical Smoke Detection and
Removal,” IEEE Trans. Med. Imaging, vol. 39, no. 5, pp. 1615–1625, Nov.
2019.
[26] B. Bier, F. Goldmann, J.-N. Zaech, J. Fotouhi, R. Hegeman, R. Grupp,
M. Armand, G. Osgood, N. Navab, A. Maier, and M. Unberath, “Learning
to detect anatomical landmarks of the pelvis in X-rays from arbitrary views,”
Int. J. CARS, vol. 14, no. 9, pp. 1463–1473, Sep. 2019.
[27] K. Li, Y. Xu, J. Wang, D. Ni, L. Liu, and M. Q.-H. Meng, “Image-Guided
Navigation of a Robotic Ultrasound Probe for Autonomous Spinal Sonography
Using a Shadow-Aware Dual-Agent Framework,” IEEE Transactions on
Medical Robotics and Bionics, vol. 4, no. 1, pp. 130–144, Nov. 2021.
[28] A. Munawar, Z. Li, P. Kunjam, N. Nagururu, A. S. Ding, P. Kazanzides,
T. Looi, F. X. Creighton, R. H. Taylor, and M. Unberath, “Virtual reality
REFERENCES 33
Magnetic Resonance Materials in Physics, Biology and Medicine, vol. 27, pp.
419–424, 2014.
[90] N. B. Semmineh, A. M. Stokes, L. C. Bell, J. L. Boxerman, and C. C. Quarles,
“A population-based digital reference object (dro) for optimizing dynamic
susceptibility contrast (dsc)-mri methods for clinical trials,” Tomography,
vol. 3, no. 1, pp. 41–49, 2017.
[91] W. Kainz, E. Neufeld, W. E. Bolch, C. G. Graff, C. H. Kim, N. Kuster,
B. Lloyd, T. Morrison, P. Segars, Y. S. Yeom et al., “Advances in computational
human phantoms and their applications in biomedical engineering—a topical
review,” IEEE transactions on radiation and plasma medical sciences, vol. 3,
no. 1, pp. 1–23, 2018.
[92] G. Cloutier, G. Soulez, S. D. Qanadli, P. Teppaz, L. Allard, Z. Qin, F. Cloutier,
and L.-G. Durand, “A multimodality vascular imaging phantom with fiducial
markers visible in dsa, cta, mra, and ultrasound,” Medical Physics, vol. 31,
no. 6, pp. 1424–1433, 2004.
[93] B. Driscoll, H. Keller, and C. Coolens, “Development of a dynamic flow imaging
phantom for dynamic contrast-enhanced ct,” Medical physics, vol. 38, no. 8,
pp. 4866–4880, 2011.
[94] A. Shukla-Dave, N. A. Obuchowski, T. L. Chenevert, S. Jambawalikar,
L. H. Schwartz, D. Malyarenko, W. Huang, S. M. Noworolski, R. J. Young,
M. S. Shiroishi et al., “Quantitative imaging biomarkers alliance (qiba)
recommendations for improved precision of dwi and dce-mri derived biomarkers
in multicenter oncology trials,” Journal of Magnetic Resonance Imaging,
vol. 49, no. 7, pp. e101–e121, 2019.
[95] C. Wu, D. A. Hormuth II, T. Easley, V. Eijkhout, F. Pineda, G. S. Karczmar,
and T. E. Yankeelov, “An in silico validation framework for quantitative dce-
mri techniques based on a dynamic digital phantom,” Medical Image Analysis,
vol. 73, p. 102186, 2021.
[96] S. Marchesseau, T. Heimann, S. Chatelin, R. Willinger, and H. Delingette,
“Fast porous visco-hyperelastic soft tissue model for surgery simulation:
Application to liver surgery,” Prog. Biophys. Mol. Biol., vol. 103, no. 2, pp.
185–196, Dec. 2010.
[97] J. Y. Wu, A. Munawar, M. Unberath, and P. Kazanzides, “Learning Soft-Tissue
REFERENCES 41
Photons Plus Ultrasound: Imaging and Sensing 2019. SPIE, Feb. 2019, vol.
10878, pp. 95–102.
[173] M. Xu and L. V. Wang, “Photoacoustic imaging in biomedicine,” Rev. Sci.
Instrum., vol. 77, no. 4, p. 041101, Apr. 2006.
[174] M. Shi, Z. Wang, T. Zhao, S. J. West, A. E. Desjardins, T. Vercauteren, and
W. Xia, “Enhancing Photoacoustic Visualisation of Clinical Needles with Deep
Learning,” in 2021 IEEE International Ultrasonics Symposium (IUS). IEEE,
Sep. 2021, pp. 1–4.
[175] M. Shi, T. Zhao, S. J. West, A. E. Desjardins, T. Vercauteren, and W. Xia,
“Improving needle visibility in LED-based photoacoustic imaging using deep
learning with semi-synthetic datasets,” Photoacoustics, vol. 26, p. 100351, Jun.
2022.
[176] D. Chen and R. J. McGough, “A 2d fast near-field method for calculating
near-field pressures generated by apodized rectangular pistons,” Journal of the
Acoustical Society of America, vol. 124, no. 5, pp. 1526–1537, 2008.
[177] J. F. Kelly and R. J. McGough, “A time-space decomposition method for
calculating the nearfield pressure generated by a pulsed circular piston,” IEEE
Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 53,
no. 6, pp. 1150–1159, 2006.
[178] Y. Zhu, T. L. Szabo, and R. J. McGough, “A comparison of ultrasound image
simulations with FOCUS and field II,” in 2012 IEEE International Ultrasonics
Symposium. IEEE, Oct. 2012, pp. 1694–1697.
[179] A. Arjas, E. J. Alles, E. Maneas, S. Arridge, A. Desjardins, M. J. Sillanpää, and
A. Hauptmann, “Neural Network Kalman Filtering for 3-D Object Tracking
From Linear Array Ultrasound Data,” IEEE Trans. Ultrason. Ferroelectr. Freq.
Control, vol. 69, no. 5, pp. 1691–1702, Mar. 2022.
[180] L. Bartha, A. Lasso, C. Pinter, T. Ungi, Z. Keri, and G. Fichtinger, “Open-
source surface mesh-based ultrasound-guided spinal intervention simulator,”
Int. J. CARS, vol. 8, no. 6, pp. 1043–1051, Nov. 2013.
[181] H. Patel and I. Hacihaliloglu, “Improved Automatic Bone Segmentation Using
Large-Scale Simulated Ultrasound Data to Segment Real Ultrasound Bone
Surface Data,” in 2020 IEEE 20th International Conference on Bioinformatics
and Bioengineering (BIBE). IEEE, Oct. 2020, pp. 288–294.
REFERENCES 50
[192] J. Masonis, C. Thompson, and S. Odum, “Safe and accurate: learning the
direct anterior total hip arthroplasty.” Orthopedics, vol. 31, no. 12 Suppl
2, p. orthosupersite.com/view.asp?rID=37187, Dec. 2008. [Online]. Available:
https://europepmc.org/article/med/19298019
[193] M. L. C. Routt, Jr., S. E. Nork, and W. J. Mills, “Per-
cutaneous Fixation of Pelvic Ring Disruptions,” Clinical Or-
thopaedics and Related Research®, vol. 375, pp. 15–29, Jun.
2000. [Online]. Available: https://journals.lww.com/clinorthop/fulltext/
2000/06000/percutaneous_fixation_of_pelvic_ring_disruptions.4.aspx?
casa_token=dLjKpjc-lIoAAAAA:HRooQQ0Q8Dk8wyBsBOeawyrpNZ_
LkQcmIOxqTftQUpD2csLkGHZx0LxXeSP4QYs-2xC9cPe-7Hdxwu1H8273CMZq
[194] Y. Otake, M. Armand, R. S. Armiger, M. D. Kutzer, E. Basafa, P. Kazanzides,
and R. H. Taylor, “Intraoperative Image-based Multiview 2D/3D Registration
for Image-Guided Orthopaedic Surgery: Incorporation of Fiducial-Based C-
Arm Tracking and GPU-Acceleration,” IEEE Trans. Med. Imaging, vol. 31,
no. 4, pp. 948–962, Nov. 2011.
[195] R. Harstall, P. F. Heini, R. L. Mini, and R. Orler, “Radiation Exposure to
the Surgeon During Fluoroscopically Assisted Percutaneous Vertebroplasty: A
Prospective Study,” Spine, vol. 30, no. 16, pp. 1893–1898, Aug. 2005.
[196] R. Kloeckner, A. Bersch, D. P. dos Santos, J. Schneider, C. Düber, and
M. B. Pitton, “Radiation Exposure in Nonvascular Fluoroscopy-Guided
Interventional Procedures,” Cardiovasc. Intervent. Radiol., vol. 35, no. 3, pp.
613–620, Jun. 2012.
[197] D. L. Miller, E. Vañó, G. Bartal, S. Balter, R. Dixon, R. Padovani, B. Schueler,
J. F. Cardella, and T. de Baère, “Occupational Radiation Protection in
Interventional Radiology: A Joint Guideline of the Cardiovascular and
Interventional Radiology Society of Europe and the Society of Interventional
Radiology,” Cardiovasc. Intervent. Radiol., vol. 33, no. 2, pp. 230–239, Apr.
2010.
[198] M. Zellerhoff, Y. Deuerling-Zheng, C. M. Strother, A. Ahmed, K. Pulfer,
T. Redel, K. Royalty, J. Grinde, and D. Consigny, “Measurement of cerebral
blood volume using angiographic C-arm systems,” in Proceedings Volume 7262,
Medical Imaging 2009: Biomedical Applications in Molecular, Structural, and
Functional Imaging. SPIE, Feb. 2009, vol. 7262, pp. 121–128.
REFERENCES 52
from X-ray in temporal bone surgery,” Int. J. CARS, vol. 15, no. 7, pp. 1137–
1145, Jul. 2020.
[225] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and
R. Ng, “NeRF: representing scenes as neural radiance fields for view synthesis,”
Commun. ACM, vol. 65, no. 1, pp. 99–106, Dec. 2021.
[226] P. N. Huy and T. M. Quan, “Neural Radiance Projection,” in 2022 IEEE 19th
International Symposium on Biomedical Imaging (ISBI). IEEE, Mar. 2022,
pp. 1–5.
[227] T. M. Quan, H. M. Thanh, T. D. Huy, N. Do Trung Chanh, N. T. P. Anh, P. H.
Vu, N. H. Nam, T. Q. Tuong, V. M. Dien, B. Van Giang, B. H. Trung, and
S. Q. H. Truong, “XPGAN: X-Ray Projected Generative Adversarial Network
For Improving Covid-19 Image Classification,” in 2021 IEEE 18th International
Symposium on Biomedical Imaging (ISBI). IEEE, Apr. 2021, pp. 1509–1513.
[228] E. Abadi, B. Harrawood, S. Sharma, A. Kapadia, W. P. Segars, and E. Samei,
“DukeSim: A Realistic, Rapid, and Scanner-Specific Simulation Framework
in Computed Tomography,” IEEE Trans. Med. Imaging, vol. 38, no. 6, pp.
1457–1465, Jun. 2019.
[229] M. Zarei, E. Abadi, R. Fricks, W. P. Segars, and E. Samei, “A probabilistic
conditional adversarial neural network to reduce imaging variation in
radiography,” in Proceedings Volume 11595, Medical Imaging 2021: Physics
of Medical Imaging. SPIE, Feb. 2021, vol. 11595, pp. 1026–1033.
[230] Y. Lee, X. Tan, C.-B. Chng, and C.-K. Chui, “Simulation of Robot-Assisted
Flexible Needle Insertion Using Deep Q-Network,” in 2019 IEEE International
Conference on Systems, Man and Cybernetics (SMC). IEEE, Oct. 2019, pp.
342–346.
[231] J.-N. Zaech, C. Gao, B. Bier, R. Taylor, A. Maier, N. Navab, and M. Unberath,
“Learning to Avoid Poor Images: Towards Task-aware C-arm Cone-beam CT
Trajectories,” in Medical Image Computing and Computer Assisted Intervention
– MICCAI 2019. Cham, Switzerland: Springer, Oct. 2019, pp. 11–19.
[232] M. Thies, J.-N. Zäch, C. Gao, R. Taylor, N. Navab, A. Maier, and M. Unberath,
“A learning-based method for online adjustment of C-arm Cone-beam CT
source trajectories for artifact avoidance,” Int. J. CARS, vol. 15, no. 11, pp.
1787–1796, Nov. 2020.
REFERENCES 56