2022 in Silico Review

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/370551251
In Silico simulation: A key enabling technology for next-generation intelligent

surgical systems
Article in Progress in Biomedical Engineering · May 2023

DOI: 10.1088/2516-1091/acd28b
CITATIONS READS
0 171
5 authors, including:
Russell Taylor Mathias Unberath

Johns Hopkins University Johns Hopkins University
435 PUBLICATIONS 12,603 CITATIONS 240 PUBLICATIONS 2,754 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Robot-Assisted treatment of osteolysis using a continuum manipulator View project
Robots with Complementary Situational Awareness View project
All content following this page was uploaded by Mathias Unberath on 07 May 2023.
The user has requested enhancement of the downloaded file.

In Silico Simulation: A Key Enabling Technology
for Next-generation Intelligent Surgical Systems
Benjamin D. Killeen, Sue Min Cho, Mehran Armand,
Russell H. Taylor, and Mathias Unberath
E-mail: killeen@jhu.edu, scho72@jhu.edu, marmand2@jhmi.edu,

rht@jhu.edu, and unberath@jhu.edu
March 2023
Abstract. To mitigate the challenges of operating through narrow incisions

under image guidance, there is a desire to develop intelligent systems that assist
decision making and spatial reasoning in minimally invasive surgery (MIS). In
this context, machine learning-based systems for interventional image analysis are
receiving considerable attention because of their flexibility and the opportunity
to provide immediate, informative feedback to clinicians. It is further believed
that learning-based image analysis may eventually form the foundation for semi-
or fully automated delivery of surgical treatments. A significant bottleneck in
developing such systems is the availability of annotated images with sufficient
variability to train generalizable models, particularly the most recently favored
deep convolutional neural networks or transformer architectures. A popular
alternative to acquiring and manually annotating data from the clinical practice
is the simulation of these data from human-based models. Simulation has
many advantages, including the avoidance of ethical issues, precisely controlled
environments, and the scalability of data collection. Here, we survey recent work
that relies on in silico training of learning-based MIS systems, in which data are
generated via computational simulation. For each imaging modality, we review
available simulation tools in terms of compute requirements, image quality, and
usability, as well as their applications for training intelligent systems. We further
discuss open challenges for simulation-based development of MIS systems, such as
the need for integrated imaging and physical modeling for non-optical modalities,
as well as generative patient models not dependent on underlying CT, MRI, or
other patient data. In conclusion, as the capabilities of in silico training mature,
with respect to sim-to-real transfer, computational efficiency, and degree of control,
they are contributing toward the next generation of intelligent surgical systems.
In Silico Simulation: A Key Enabling Technology 2
Keywords: in silico virtual clinical trials, minimally invasive surgery, machine

learning, reinforcement learning, deep neural network, endoscopy, X-ray, ultrasound
1. Introduction
Minimally invasive surgery (MIS) represents a notable shift in surgical methodology,

from open procedures to optimized techniques with minimal incisions that reduce
secondary effects [1]. Benefits for patients include reduced postoperative pain [2–
4]; shorter hospital stays and correspondingly lower costs [4–6], faster recovery
times [4, 7]; fewer major complications [4, 8]; reduced scarring from the smaller
incision [4, 9]; and reduced stress on the immune system [4]. At the same time,
MIS presents nontrivial procedural challenges stemming from operating through
such narrow incisions, such as safely navigating subcutaneous pathways [10–13],
identifying percutaneous approaches [13–15], obtaining clinically relevant views [16–
19], minimizing exposure in radiation imaging [20–22], and performing complex
motions under unique spatial constraints [12, 23]. Although many of these challenges
are common in other surgical domains, they can be especially prominent in MIS
[24]. Moving forward, intelligent surgical systems promise to preserve the distinct
advantages of MIS while offering to alleviate remaining challenges in its delivery, for
example by improving intra-operative image quality [25], providing real-time image
analysis [26], actuating robotic imaging devices [18, 19, 27], and performing skilled
motor actions in the surgical setting [28, 29].
Learning-based intelligent systems have gained increasing prominence in recent
years as computational resources and the maturity of artificial intelligence (AI)
algorithms enable deployment in multiple areas of medicine [30, 31]. AI already
enjoys wide application in radiology, where training benefits from the fact that
data needed for diagnostic reasoning—radiographs as input and clinical annotations
as output—coincide with information already recorded in standard practice [32,
33]. Unfortunately, this is not the case in MIS, since the purpose of intra-
operative imaging is to inform both spatial reasoning and clinical decision making
[34, 35]. Although specific spatial annotations, such as camera pose, tool-to-tissue
relationships, and anatomical shape, may be obtained from real images through post
hoc analysis [36], by incorporating robot kinematic models [37, 38], or by leveraging
the shape of known components [39, 40], the assumptions made by these approaches
do not extend beyond their specific use-case. The fact that intra-operative images
are not routinely recorded nor stored further complicates the curation of large
data sets that are needed for training machine learning algorithms, including deep
convolutional neural networks, transformer architectures, which are the backbone of
modern intelligent systems.
In silico training circumvents these obstacles through the use of computer
simulation as an ethical, controlled, and scalable source of data for intelligent systems
in MIS. This approach is already prominent in related problems of general robotic
manipulation [41, 42] and self-driving [43], each of which contends with challenges
specific to the given task, environment, and data modality. In the context of
MIS, numerous simulation frameworks have been developed by researchers either
with machine learning specifically in mind [44–46] or as training tools for clinicians
and subsequently adapted for machine learning applications [29, 37]. One of the
most mature examples is the SimNow software suite, developed for the da Vinci
Surgical Robot, which includes training routines for 33 surgical skills [47], as well
as complete procedures (Intuitive Surgical Operations, Inc.; Sunnyvale, California).
The increasing maturity of this and other simulation frameworks for MIS has
heightened interest in in silico training for intelligent systems as well as clinicians.
In this article, we review the progress in and challenges facing in silico training of
intelligent MIS systems. Given the differences in application, reviews of simulation
training for general robotics, such as [48], focus on the use of RGB(-D) imaging,
which generally does not apply in the context of MIS. Further, although previous
reviews include recent advances in MIS [49–60], robotic-assisted MIS [56, 61–64],
machine learning in surgical interventions [34, 35, 65–73], or surgical simulation for
human training purposes [74–77], in silico training specifically for intelligent MIS
systems remains an emerging area deserving of an introduction. We focus this review
on frameworks and successful applications in three imaging modalities which have
received the bulk of researchers’ attention, namely endoscopy, ultrasound, and X-ray.
For each simulation framework, we review the image quality in terms of sim-to-real
transfer, simulation dynamism, computational resources, and time cost.
The outline of the review is as follows. In Section 2, we introduce overarching
concepts relevant to simulation. Diving into recent progress, Sections 3, 4, 5, and
6 explore the various simulation frameworks and their applications for training
intelligent systems in MIS, for endoscopic, stereo microscopic, ultrasound (US),
computed tomography (CT), and X-ray imaging, respectively. Table 1 summarizes
the available simulation frameworks for each modality. We first introduce frameworks
which have received the most attention and subsequently highlight recent alternatives
that may be of interest, focusing on these modalities because they have received
the most attention from the community for in silico training. In Section 7, we
call attention to the capabilities of systems developed for different modalities and
speculate on future directions in this exciting area.
2. Simulation Methods for MIS
The required capabilities of a simulation framework for in silico training depend on

the task at hand. For surgical guidance, the simulator must incorporate dynamic
spatial relationships of the tools, anatomy, and imaging device, while performing
surgical tasks requires tool-to-tissue interactions with the target anatomy to be
represented in the simulation. For example, an in silico training framework for
vertebroplasty may target the needle insertion task, which is primarily a navigation
problem, or the cement injection process, in which interventionalists must decide
whether to continue injecting bone cement or stop [78, 79]. The former involves
simulating radiographs with surgical tools present at various stages of insertion and
with varied poses relative to the anatomy, as in [80], while the latter—in addition to
simulating images and tools— also requires the simulation of the fluid dynamics and
imaging characteristics of bone cement in osteoporotic bone. In other words, the first
challenge can be likened to a computer graphics problem, ensuring simulated images
match their real-world counterparts as much as possible, while the second challenge
further requires integration of a physics engine, estimating how physical processes
unfold over time. We distinguish between these two facets of simulation, the visual
and the physical, since many simulation frameworks at this moment focus on one or
the other.
Within both physical and visual simulators, there are two major approaches
for computation, based on the representation of the underlying data. Volumetric
methods rely on dense 3D arrays for data representation, such as those provided
by CT and MRI scans, and are typically used to simulate image formation for X-
ray and ultrasound images. Sparse methods rely on point-cloud or surface data to
represent objects, allowing for memory-efficient representations of spatially diverse
scenes with moving objects. Rendering of surface meshes falls under this category, as
well as most rigid body physics engines, which represent objects as surface meshes.
Fig. 1 illustrates the difference between these two approaches for image rendering
Image plane
Image plane
Pixel
Pixel
Sampled points
Intersection
Array volume
Ray Sparse triangle mesh
Ray
Camera center Camera center/source
(a) Volumetric Simulation (b) Sparse Simulation
Figure 1: Simulation methods for 2D image formation generally adhere to two major
approaches based on the data structure: (a) volumetric simulation and (b) sparse
simulation, which may use point clouds, surface meshes (as shown here), spline-based
object models, or other spatial data structure. In (a), dense grid data makes up a
volume, which individual rays “march” through to model energy-matter interactions.
In (b), a sparse triangle mesh models the surface of an object, which one or more
rays intersect at a finite set of points. The latter approach is commonly used in
computer graphics, with open source tools such as Blender [81] available for creating
3D scenes with complicated meshes, or they can be sampled from statistical shape
models [82].
in the optical domain. In Fig. 1a, the computation of a given pixel value involves
“marching” along the corresponding ray and processing the contributions at each
point in the volume [83]. When considering interaction effects, like scattering, this
technique can result in photo-realistic rendering of complicated scenes, but until
recently it was computationally prohibitive for generating large datasets. The use
of increasingly powerful GPU devices has made volumetric methods more feasible
for a variety of applications. Fig. 1b, by contrast, considers the point where the ray
intersects with a surface mesh, a more efficient problem in terms of computational
time and memory [84].
The benefits of sparse simulation methods primarily stem from the increased
computational efficiency, which can enable simulations with more capable dynamic
interactions, while volumetric methods contain more detail for rendering realistic
images. A sparse patient model, for example, consists of enclosed meshes segmenting
each organ, as in [82]. This enables fast simulation of physical interactions, such
as collision detection and realistic tissue deformation, and for visible light imaging
modalities, in which reflective properties of surfaces are responsible for a large portion
of image formation, can provide the basis for photorealistic image rendering [81].
Simulating transmissive imaging modalities, however, such as X-ray and US, requires
modeling the energy-matter interactions throughout a medium, including the interior
of surface meshes. For sparse models, this can be accomplished by modeling the
distribution of attenuation properties within each region, e.g. by assuming uniform
density and material composition or using splines [85], but volumetric patient models
are inherently better suited for this task because they contain dense material property
data. Thus for transmissive modalities, volumetric models (with sufficient spatial
resolution) enable more realistic simulation of image formation, while sparse methods
are sufficient for realistic rendering of visible light images and support more powerful
simulation dynamics.
The quality of imaging simulation directly affects the sim-to-real performance
of the system, that is its ability to train in silico but perform on the corresponding
domain of real images. In machine learning, this problem is referred to as domain
adaptation. One domain adaptation approach, which we find to be commonly used
in the papers described here, is the use of generative adversarial networks (GANs) to
translate images from the simulation domain to the real, using an unsupervised cycle-
consistent loss [86]. Separately, it should be noted that the sim-to-real capability
of a model differs from its generalizability. The former is a matter of image
appearance and tool-to-tissue interaction (if considered), while the latter describes an
intelligent system’s performance on samples and situations not seen during training,
either in the simulation or real domain. As an example, a bone segmentation
algorithm may achieve high sim-to-real performance but still fail when confronted
with anamalous anatomy or fractures, regardless of the domain. The differences in
imaging parameters and techniques between institutions also introduces the need for
generalizability [87]. In general, it is understood that training a generalizable model
requires sufficient variation in the simulation parameters, including image formation
characteristics as well as patient demographics, pathologies, and anatomical features.
In order to facilitate in silico training, it is necessary to obtain ground truth
annotations of the image. For 2D imaging modalities, this often involves propagating
information from the 3D simulation, where positions and orientations for objects of
Image plane
ũ
Camera intrinsics
K
x̃
Camera
World
Camera extrinsics [R|t]
Figure 2: Pinhole camera geometry often used for modeling endoscopic, stereoscopic,
and X-ray imaging. The camera projection matrix P = K[R|t] can be used to
propagate 3D annotations, like the landmark x̃ to the image plane.
interest are known, to the 2D image being simulated. For example, in endoscopic,
stereoscopic, and X-ray imaging, the intrinsic parameters of the camera geometry—
its focal length and pixel spacing—can be modeled in the intrinsic matrix K ∈ R3×3 ,
while the position and orientation of the camera are encoded in the extrinsic matrix
[R|t]. Thus for a homogeneous 3D point x̃ = [x, y, z, 1] in the simulated world, such
as an anatomical landmark, the corresponding homogenous point in the image can
be determined as
ũ = K [R|t] x̃, (1)
as in Fig. 2. Similar operations can determine image-space projections for slice-
to-volume transformations, as they arise in ultrasound, and extend to different
annotations such as lines, volumes, and 3D segmentations, enabling richly annotated,
large-scale datasets to be generated from relatively few annotations of the underlying
3D data.
2.1. Instrument and Patient Models

Given a simulator’s data representation, the question arises of how to obtain
compatible instrument and patient models for MIS. For instruments, this is relatively
straightforward since computer-aided design of individual components utilizes
sparse representations, which can be converted to volumetric representations by
voxelization. Since instruments generally consist of homogeneous components with

a small number of materials, sparse representations are often capable of simulating
them realistically for either visible light or transmissive imaging modalities. Patient
data, on the other hand, can be derived from 3D medical imaging or digital phantoms.
CT and MRI imaging are commonly used to create volumetric patient models, and
they can be segmented to create sparse representations as well. They are limited,
however, in terms of the structures they can model, which must be visible in the
original image, and the generality of anatomical features, which pertain to specific
individuals.
Digital phantoms, on the other hand, are computer-generated models that can
provide geometric and biophysical properties of the anatomy of interest [82, 88].
They are also referred to as virtual phantoms [82, 88, 89], digital reference objects
[90], or computational anthropomorphic phantoms [91]. A key advantage of digital
phantoms is their capability to represent complex shapes and material properties for
populations rather than specific individuals [92–94]. Moreover, in contrast with
ex vivo medical imaging, digital phantoms are not subject to tissue shrinkage,
deformation, or any physical changes caused by invasive or post-mortem procedures
[95]. They are analogous to instrument models generated through computer-aided
design in that they can be converted directly into sparse patient models or voxelized
into volumetric models and in that their suitability for a given simulation depends
on the features included and the level of detail.
Several digital phantoms have been developed to facilitate in silico medical
imaging and support related research. The Virtual Imaging Clinical Trial Regulatory
Evaluation (VICTRE) phantom, for example, replicates the tissue and large vessels
of the breast for recapitulation of clinical features and is used in the regulatory
evaluation of imaging products [88]. The 4D extended cardiac-torso (XCAT)
phantom models the torso anatomy with cardiac and respiratory motions, using
nonuniform rational B-splines to define organ shapes [82]. These models can be
manipulated to represent a range of patient characteristics, such as height, weight,
and anatomical anomalies, for both male and female anatomies. Future research
efforts in digital phantoms aim to improve their accuracy and resolution to support
more advanced applications, potentially by incorporating recent advances in machine
learning [91].
2.2. Physics Engine

A “physics engine” is a type of simulation that determines how virtual objects
will behave over time, according to realistic modeling of collisions, friction, fluids,
deformable solids, and other physical phenomena [48]. This is different from the
visualization of these objects, which is referred to as rendering or image formation,
and occurs for a single timestep. The time and computational complexity of a
physics engine depends on the physical processes at play and simulation quality. For
example, fluid dynamics and deformable tissue modeling are more computationally
intensive than rigid body simulation, and recent work has focused on accelerating
these capabilities to enable real-time intra-operative modeling for in silico surgical
trials [96, 97]. For in silico training, physics engines have been developed prominently
for applications in endoscopy and stereo microscopy [45, 46, 98], where visible sensors
more closely resemble RGB(-D) cameras commonly deployed in general robotics.
In Silico Simulation: A Key Enabling Technology
Table 1: Simulation Frameworks
Modality Framework Platforms Open Description

AMBF [45] C++, Python, ROS ✓ Physics-based real-time dynamic multi-body simulator
AMBF+ [99] Python, ROS ✓ Physics-based endoscopic simulator
AMBF Drilling Simulator [46] C++, Python, ROS ✓ Physics-based microscopic simulator
Visible Light
AMBF-RL [100] Python, ROS ✓ Physics-based endoscopic simulator
VisionBlender [98] Blender [81] plug-in ✓ Physics-based monoscopic/stereo simulator
SurRol [29] Python ✓ Physics-based endoscopic simulator
Field II [101–103] MATLAB, Python, Octave, C × Physics-based US imaging simulator
RLUS [104] Python ✓ RL environments using Field II
k-Wave [105] MATLAB ✓ Physics-based US imaging simulator
Ultrasound FOCUS [106] MATLAB ✓ Physics-based US imaging simulator
PLUS [107] Slicer [108] plugin ✓ Translational suite with simple US simulation
Verasonics Application × Commercial platform for US research
SIMUS/MUST [109, 110] MATLAB ✓ Physics-based US imaging simulator
CONRAD [85] Java + OpenCL ✓ Physics-based X-ray + CT simulation
MC-GPU [111] CUDA ✓ Volumetric Monte Carlo simulation
X-ray DeepDRR [44, 112] Python + CUDA ✓ Physics-based X-ray simulation
DRRGenerator [113] Slicer Extension ✓ Conventional DRR visualization plugin
Insight Toolkit (ITK) C++, Python ✓ Scientific image processing, segmentation, and registration
10
3. Visible Light Imaging
Endoscopic or laparoscopic imaging is carried out via an endoscope, a flexible

instrument with a light source, which can be inserted through natural openings
or narrow incisions [114]. Early implementations of this technology made use of
fiber optic bundles that transmitted visible light to a focusing lens at the top of the
bundle, but frequent ruptures in the bundle obscured visual clarity. In the 1990s, the
advent of videoendoscopy replaced fiber optic devices with charge-coupled devices
[114], improving image quality and reliability while also enabling digital processing of
endoscopic images. From its advent, endoscopic imaging has been strongly associated
with MIS. Laparoscopic procedures, in which the endoscope and surgical instruments
are inserted through narrow incisions in the midsection, include partial gastrectomy
[115], cholecystectomy [116], splenectomy [117], pancreatic resection [118], colorectal
resections, and polyptectomies [119], among others [120]. Endoscopic imaging is also
commonly used for surgical interventions in the ear [121], mouth [122], and sinus
cavity [123]. The advantages of this modality are high resolution and radiation-free
imaging of soft tissue, although it requires an anatomical pathway for the endoscope,
often involving an incision.
In the context of in silico training, endoscopic imaging is appealing because its
reliance on visible light cameras allows biomedical engineers to take advantage of
related research in general computer vision and computer graphics. In particular,
computer simulation frameworks for endoscopic surgery may be built on top of
more general software [98], taking advantage of photorealistic ray-tracing [81] or
commercial physics engines [124]. Prevalent challenges in this setting have to do
with simulation of deformable surfaces [125], with much work focused on developing
intelligent control policies for robotic surgery. Surgeons and intelligent systems must
also contend with smoke from energized devices [25, 38] and blood. Finally, in order
to facilitate machine learning, it is desirable for simulation frameworks to provide
various ground truth data by default, including segmentation maps, depth maps,
object poses, camera pose, and more [99].
3.1. Asynchronous Multi-Body Framework (AMBF)

In 2019, Munawar et al. [45] introduce the Asynchronous Multi-Body Framework
(AMBF) for simulating complex robots with realistic environmental dynamics to
support robots with closed-link kinematic chains and redundant mechanisms, which
are common in surgical robotics but not supported by popular robotics frameworks.
In addition, the simulator supports non-rigid bodies, such as soft tissue, and
built-in models of real-world surgical robots. These properties have supported
immersive simulation of lateral skull-based surgery for simultaneous training and
data generation, suitable for training machine learning models [46].
Building upon the AMBF, Munawar et al. [99] propose a unified simulation
platform for robotic surgery with the da Vinci robot to facilitate the prototyping
of various scenes, including an example suturing task. Their contributions enable
real-time control and detection collision within the simulation, as well as generation
of ground truth data for machine learning, enabling human-driven simulation to
provide kinematic as well as image data over extended sequences.
To support simultaneous human training and in silico collection of data
for machine learning applications, Munawar et al. [28] further extends the
aforementioned functionality to include haptic feedback for lateral skull-based
surgery as well as a simulated stereo microscope to enable depth perception in the
operating field. They demonstrate the utility of this approach for the purposes of
training intelligent systems by training a stereo depth network (STTR [126]) on the
simulation images.
Varier et al. [100] propose AMBF-RL to support real-time reinforcement learning
(RL) for surgical robotic tasks. They validate their approach using in silico training
of an RL agent for debris removal with the dVRK Patient Side Manipulator (PSM)
and successfully transfer the optimal policy to a real robot.
3.2. Blender
Cartucho et al. [98] introduce VisionBlender, a software plugin allows users to create
highly realistic computer vision datasets with segmentation maps, depth maps, and
other ground truth, using the 3D modeling software, Blender [81], as a backbone.
VisionBlender is designed with robotic surgical applications specifically in mind,
supporting playback through the ROS platform to simulate real-time data collection
from a da Vinci robot via dVRK. [127] utilize VisionBlender [98] to generate a
simulated laparoscopic dataset for a performance evaluation of their improved marker
in surgical tool tracking.
Chen et al. [25] uses Blender [81] to simulate smoke in endoscopic video, such
as is commonly caused by energized surgical instruments. In this case, in silico

training enables them to generate scalar ground truth masks for the amount of
smoke, decomposing the learning problem into a two-step process and enabling
smoke removal via generative cooperative networks. Zhou et al. [128] develop an
open-source simulation tool based on Blender [81] for data generation of synthetic
endoscopic videos and relevant ground truths that can allow vigorous evaluation of
3D reconstruction methods.
3.3. Alternative Frameworks

Emulating recent successes in general robotic manipulation using deep reinforcement
learning (RL), Xu et al. [29] introduces an open-source, RL-centered platform for
training intelligent systems on skills using the da Vinci Surgical Robot. Their
“SurRol” platform relies on the PyBullet [129] physics engine for real-time interaction
and can be deployed on real da Vinci robots using the da Vinci Research Kit
(dVRK) [130]. 10 surgery-related tasks are included as RL environments, compatible
with OpenAI Gym [131], including fundamental tasks in laparoscopy like peg transfer
[132], needle transfer, and camera manipulation to obtain desired views.
Garcia-Peraza-Herrera et al. [133] take a semi-in silico approach, simulating
images of robotic laparoscopic surgery by overlaying sample surgical instruments in
the foreground (captured using a green screen) of instrument-free background images.
Although both background and foreground are captured rather than simulated, the
separation enables straightforward simulation of the instruments in multiple arbitrary
poses, with known segmentation maps. This approach resembles computational
simulation of the entire process, although it lacks rich annotations such as depth
maps and surface normals, which are readily available from fully in silico approaches.
Colleoni et al. [134] propose a deep neural net for processing of laparoscopic
and simulation images for robust segmentation of surgical tools. They collect real
image and kinematic data using the dVRK and utilize the kinematic data to produce
image data of virtual tools using a dVRK simulator [135]. Similarly, Ding et al. [38]
introduces a vision- and kinematics-based approach to robot tool segmentation based
on a complementary causal model that preserves accuracy under domain shift to
unseen domains. They collect a counterfactual dataset, where individual instances
only differ by the presence or absence of a specific source of corruption, using a
technique similar to that in [134], further demonstrating the utility of this type of
data collection while reporting similar challenges, namely the lack of tool-to-tissue
interaction through the re-play paradigm.
Wu et al. [136] introduce a soft tissue simulator that is unified with the dVRK
framework and uses SOFA [137] framework as the physics simulator. With robotic
vision and kinematic data, they train a network to learn correction factors for finite
element method (FEM) simulations with the discrepancy of simulations and real
observations. In their follow-on work [138], the authors present a faster approach,
where they implement a step-wise framework in the network for interactive soft-tissue
simulation and real-time observations.
Several datasets are available for visible light imaging modalities, based on in
silico simulation, although the simulation framework itself may not be available.
The Surgical Visual Domain Challenge dataset had participants transfer skills from
simulated da Vinci surgery (based on the SimNow simulator) to real endoscopic video
[139]. It is available as the “SurgVisDom dataset.” Madapana et al. [140] investigate
sim-to-real skill transfer in the context of dextrous surgical skills, simulating a dataset
for the aforementioned peg transfer task and collecting corresponding instances on
real robots, namely the Taurus II and YuMi. Their DESK dataset is publicly
available. Rahman et al. [141] use simulated OpenAI Gym environment and real-
world DESK dataset to evaluate robotic activity classification methods.
4. Ultrasound Imaging
In ultrasound (US) imaging, a transducer transmits compression and rarefactions

through tissue and reconstructs an image based on the reflections of these waves
[55]. The heterogeneous elastic properties of internal tissues determine characteristic
reflectance patterns, enabling soft and hard tissue structures to be resolved. A
typical US probe operates in the MHz range [142] and can be small, low-cost,
and non-invasive [143]. US guidance is standard practice for minimally invasive
biopsies for breast [144] and prostate cancer [145], and it has been used in tandem
with robotic laparoscopy [146–148]. Other US guided procedures include minimally
invasive prostatectomy [149], nephrectomy, adrenalectomy [150], and cardiac surgery
[151]. Opportunities for intelligent systems in this area include image interpretation
and autonomous probe movement, both of which have been explored using in silico
training.
US simulation methods range from simple image manipulations to physics-based
models of wave propagation. They can rely on sparse representations of data, with
finite point sources at known locations, or volumetric patient data based on CT or
MRI. In the latter case, it is necessary to process the original image, which measures
either X-ray attenuation in the case of CT or atomic spin in MRI, to infer tissue
elasticity and reflectance properties relevant to US. If an organ segmentation is
available, the ultrasound absorption and attenuation can be estimated separately
for each tissue type based on values in the literature [152]. A distinctive trait of US
images is speckle, which arises from micro-inhomogeneities in tissue [153]. Although
such structure is smaller than the spatial resolution of CT or MRI data, it can likewise
be modeled based on work done for tissue characterization purposes [154, 155].
In general, the US community prefers MATLAB toolkits, although Python
wrappers exist for several frameworks (Python being preferred by the machine
learning community due to [156, 157]). Below, we review these tools and their
applications for image-guided minimally invasive surgery.
4.1. Field II
Originally developed starting in 1991, Field [101–103] represents one of the early
complete solutions for computational simulation of ultrasound image formation.
Based on the Tupholme-Stepanishen method [158–160], Field can simulate any kind
of transducer geometry and excitation, with fast computation made possible by a far-
field approximation. In 1997, Field II [161] parallelizes execution to reduce simulation
time significantly, eventually enabling simulation of 128 images in 42 seconds with
full precision (0.33 seconds per image) in 2014 [162]. As of December 2022, Field II
continues to enjoy a wide user base and regular software updates, compatible with
modern processors and major operating systems (Windows, Mac OS, and Linux).
Parallel execution and a native Python implementation are available for commercial
use as a part of Field II Pro.‡
The fast inference time of deep neural networks (DNNs) means that real-time
applications can improve image quality in the operating room. Hyun et al. [163]
use Field II to simulate a training set for the purpose of speckle reduction in B-
mode images, introducing log-domain loss functions tailored for US. They showed
that speckle reduction using their DNN trained in silico outperformed the traditional
delay-and-sum and nonlocal means method on real images, in terms of preservation
‡ http://field-ii.dk/.
of resolution and detail.

Automatic segmentation of US images is of high interest for MIS applications,
enabling reconstruction of subcutaneous structures for surgical navigation. Nair
et al. [164] leverage the power of the Field II simulator, which can provide the
raw transducer signal as well as the final reconstruction, to perform simultaneous
image reconstruction and segmentation in separate channels. They demonstrate
that segmentation of anechoic targets, based on point source simulation with Field
II, enables reasonable sim-to-real performance on in vivo test data, achieving a DICE
score of 0.77 ± 0.07. Amiri et al. [165] explore the transfer learning problem in
the context of ultra-sound images, questioning the popular approach to fine tuning
final layers of a U-Net [166]. Their findings indicate that fine-tuning of shallow
DNN layers, which are responsible for recognizing low-level image features, is a more
effective strategy in the US domain than the common practice of fine-tuning deep
layers. Finally, [167] propose a DNN-based method for biopsy needle segmentation
in US-guided MIS, using the Field II simulator to create a training set of 809 images.
Despite this small training size, the trained U-Net with attention blocks was able to
localize needles with 96% precision and angular error of 0.40°in vivo.
RLUS Obtaining optimal positioning and imaging settings can be a challenging

task for US guidance in MIS [168]. To facilitate the development of autonomous
US imaging systems, Jarosik and Lewandowski [104] propose a set of reinforcement
learning environments for positioning an US probe and finding an appropriate
scanning plane, termed RLUS. Using Field II [101], they simulate a linear-array
transducing acquiring information from a simulated phantom, with cysts forming
an object of interest. The RL agent was capable of manipulating the probe while
observing the resulting B-mode frames, with positive feedback based on centering the
object in the image. Their results indicate that RL agents are able to learn effective
policies in simulation for US imaging, although more experiments are needed to
determine in vivo performance.
One of the drawbacks of the Field II simulator is computation time required
for realistic image simulation. Peng et al. [169] approximate Field II images by
translating MRI images to the ultrasound domain, training a GAN [86]. They
achieve a real-time frame rate of 15 frames/second on a GPU laptop, with images
both quantitatively and visually comparable to those from the Field II simulator.
Faster image simulation is highly relevant to training intelligent systems, especially
RL agents, which typically train simultaneous with environment simulation.
4.2. k-Wave
One way to reduce the computation necessary for US imaging is the k-space
pseudospectral method, which discretizes the equations modeling nonlinear wave
propagation [170]. The open-source tool k-Wave [105] is a MATLAB toolbox,
capable of simulating physically realistic images quickly, using a planar detector
geometry based on the fast Fourier transform. Unlike other tools, k-Wave
improves computational speed through parallel execution on graphics processing
units, simulating 1000 time-steps for a grid size of 1283 in 7 minutes (approximately
0.4s per timestep), more than 2.5 times speedup compared to multi-threaded CPU
execution.
US image guidance often requires visualization of point-like targets, such as
needle cross sections or catheters, which can be confused with point-like artifacts
commonly present in the surgical setting. Allman et al. [171] show the advantage
of DNNs in this area, which are able to distinguish true point sources with 96.67%
accuracy in phantom, after training in silico with k-Wave [105]. They achieved sub-
millimeter point source localization error (0.38 ± 0.25 mm), enabling visualization of
a novel artifact-free image in the context of MIS. A similar approach uses a simulated
used k-Wave to precisely localize point target vessel structures for guidance in MIS
[172].
The goal of distinguishing point sources with high accuracy may benefit from
photoacoustic imaging, where the absorption of optical or radio-frequency EM
excitations cause tissue to generate acoustic waves [173]. Combining in silico and in
vivo data for training, [174, 175] explore the use of light-emitting diodes (LEDs)
as excitation sources for photoacoustic visualization of clinical needles in MIS,
developing a DNN-based system to enhance the visualization, achieving 4.3-times
higher signal-to-noise ratio compared to conventional reconstructions. Their semi-
synthetic approach allows for complete knowledge of the desired ground truth while
reducing the sim-to-real gap.
4.3. FOCUS
The Fast Object-Oriented C++ Ultrasound Simulator (FOCUS) is a fast US
simulator available for MATLAB [106]. It resolves large errors in ultrasound
simulation in the nearfield and at the transducer face using the fast nearfield
method [176] and time-space decomposition [177]. Comparing the simulations of
FOCUS and Field II reveals this difference for a sampling frequency as low as 25
MHz, where the impulse response calculation in Field II introduces aliasing artifacts
[178].
Because US images typically contain only 2D data from a linear array, resolving
3D pose based on the work in the previous section can be nontrivial. Arjas
et al. [179] propose a solution based on in silico training combined with a Kalman
filter to improve localization over continuous US acquisition, as is common in the
surgical setting. They train a DNN based on the FOCUS simulator [106], although
applications to inhomogeneous tissue would require simulation with the k-Wave
framework. Nevertheless, their approach shows promise for the critical task of
reconstructing 3D tool poses from 2D US images, achieving 0.3 mm maximum error
when transferring to real US images on submerged needles in water.
4.4. PLUS Toolkit

Translating in silico and even in vivo performance to real-world systems can prove
a significant implementation challenge. In 2014, Lasso et al. [107] introduce the
Public software Library for Ultrasound (PLUS) toolkit, which is aimed at translating
image analysis methods for US-guided interventions into clinical practice. The
open source platform includes tools for tool tracking, US image acquisition, spatial
and temporal calibration, volume reconstruction, data streaming, and US image
simulation based on surface meshes, using methods proposed in [180], making it
applicable for translational research in other imaging modalities as well as US. US
simulation with PLUS achieves impressive performance of 50 frames per second, at
a resolution of 564 × 597 and 256 scan-lines per frame on a 3.4 GHz CPU processor,
although it lacks the artifacts and speckle present in real US images.
Despite lacking these distinctive characteristics, the PLUS framework can be
effectively used for in silico training. Patel and Hacihaliloglu [181] show that a
network trained with PLUS-generated images was able to transfer bone segmentation
skills to real US, with applications in US-guided percutaneous orthopedic surgery.
They compare the performance of a DNN trained with small-scale real image training
to large-scale training only possible via simulation, demonstrating the superiority of
a transfer learning network that leverages the latter.

Although the above frameworks constitute the bulk of tools used for in silico US
training, alternative methods have been proposed. In 2015, [182] introduce an
efficient ultrasound simulation method utilizing volumetric data (in this case an
MRI scan), based on convolutional ray-tracing. Depending on the number of scan-
lines, reflection depth, and axial revolution, their per-image simulation time lies
between 0.1-1s on contemporary hardware (NVIDIA GTX 850M). Image quality is
improved using non-linear optimization of the simulation parameters with respect
to real ultrasound images, although applications for in silico training have yet to be
explored.
As mentioned in the previous section, simulating the underlying physical process
of US may not be necessary for in silico training, if the sim-to-real gap can be
overcome through augmentation or transfer learning. In the same vein, Sharifzadeh
et al. [183] introduce an ultra-fast dataset generation method specifically for training
DNNs, although it would not be capable of US simulation in a fully controlled sense.
In their approach, a real US image Ireal and an arbitrary segmentation mask Imask are
transformed using the Fourier transform F, alpha-blended in the frequency domain,
and converted back to the image domain with the inverse Fourier transform:
( )
Isim = F -1 F mag (Ireal ), α F phase (Imask ) + (1 − α) F phase (Ireal ) (2)
Although an open source framework for this approach is not available, their method
is straightforward enough to be re-implemented with standard tools. Moreover, they
demonstrate that this approach, combined with image augmentation, achieved a
15.6% higher DICE score on real US images than networks trained with Field II,
while simulation of these images was 36,000 times faster, leveraging the fast Fourier
transform [184]. However, like Garcia-Peraza-Herrera et al. [133], this method only
simulates the added objects; complete ground truth for the underlying real images
are not available.
Rather than compute 2D US images from digital phantoms, Li et al. [27] propose
to use volumetric ultrasound acquisitions to sample slices for in silico training of an
autonomous system for achieving standard US views in spinal sonography. This
approach is efficient and results in highly realistic images, which can be sampled
dynamically from many viewpoints, although real patient scans are required. They
demonstrate that this in silico environment is suitable for DRL, achieving standard
views of the spine to within reasonable margins (5.18 mm / 5.25°in the intra-subject
setting).
The Verasonics Vantage ultrasound scanners are a US hardware family well-
suited to US research, due to the easy access to raw ultrasound data. Along with this
hardware, Verasonics supplies an US simulator that is integrated with their imaging
platform, making it an attractive research tool that could streamline translation
onto real devices, similar to PLUS [107, 110]. Although no theoretical work
exists describing their proprietary software, information regarding their approach
is available on their website.§
Recently, in 2022, [109, 110] propose SIMUS, an open-source ultrasound
simulator for MATLAB, as a part of the MATLAB ultrasound toolbox (MUST)
[185]. SIMUS simulates the acoustic pressure fields, performing comparably to Field
II, k-Wave, FOCUS, and Verasonics in terms of image realism. The simplicity and
online availability of this framework makes it useful for pedagogical purposes as well
as research.
Finally, [186] demonstrate the sim-to-real capabilities of a robotic US imaging
system, learning based on RGB images and sensor readings rather than the US
images themselves. This approach reduces the simulation problem to one that is more
aligned with general robotic manipulation, although it will be an essential component
of more complete simulation of an intelligent MIS system. A full simulation of the
operating room as a dynamic in silico environment, i.e. a digital twin, will require
simulation of the given medical imaging modality as well as the non-medical data
like RGB cameras and force sensors.
§ https://verasonics.com/ultrasound-simulator/
5. X-ray Imaging
X-ray imaging leverages high-energy photons to penetrate tissue, measuring the

attenuation of rays dependent on material composition and density. An X-ray tube or
“source” is responsible for photon generation primarily by means of Bremsstrahlung
[187]. Modern X-ray imaging devices, including C-arms commonly used in intra-
operative imaging, measure the attenuation of these rays with a flat-panel detector
[188], creating a 2D projective image. Many C-arm devices also support intra-
operative 3D imaging via cone-beam CT (CBCT), although these acquisitions require
significant time and radiation compared to individual radiographs. So “X-ray image
guidance” refers to 2D projective image-guidance. Together, the flexibility, image
quality, and resolution of X-ray image guidance have contributed to the development
of minimally invasive alternatives to procedures in orthopaedics [20–22, 189–194],
interventional radiology [21, 78, 79, 195–197], and angiology [197–200].
Simulation frameworks for in silico training primarily focus on simulating X-ray
image formation, since this requires simulation of attenuation through the physical
medium, rather than reflection off of surfaces, which dominates visible spectrum
imaging. This is historically computationally intensive, although the advent of high-
performance, high capacity GPU devices have enabled fast, realistic X-ray simulation
frameworks to be used for training DNNs [44]. Tool-to-tissue interaction has so
far been limited to visual interaction in the projection [112], rather than physical
interactions. The target of learning in this context is often the 2D/3D registration
of the projective image with a pre-operative CT [35, 194] or statistical atlas [201],
enabling 3D information to be computed from 2D images. A consistent challenge,
due to the ionizing radiation associated with X-ray imaging, is the reduction in the
number of acquisitions, according to the as low as reasonably achievable (ALARA)
principle [18, 19, 202].
5.1. CONRAD
The CONRAD framework is an early framework for simulating realistic radiographs,
providing the first unified library for cone-beam imaging that incorporates nonlinear
physical effects and projective geometry, written in Java and accelerated with
OpenCL [85]. Primarily intended for applications in CT reconstruction, CONRAD
relies on a spline-based object model to represent the attenuation properties of
patients and tools in space. This sparse data representation results in images that
are highly controllable but may lack the realism needed for sim-to-real transfer in
the X-ray domain. Nevertheless, in 2018 CONRAD was subsequently adapted for
in silico training of intelligent systems in cardiac statistical shape modeling [203],
digital subtraction angiography (DSA) [204], and motion reduction in diagnostic knee
imaging [205]. DSA refers to the acquisition of subsequent fluoroscopic images with
and without contrast agent, the subtraction of which yields a background-free image
focusing on the blood vessel [206]. Virtual, single-frame DSA removes the need for a
non-contrast acquisition by segmenting the foreground vessel from the background,
which Unberath et al. [204] accomplish automatically with a U-Net trained on
CONRAD. Finally, although Bier et al. [205] considers diagnostic applications,
specifically compensating for patient motion during load-bearing acquisition of
CBCT, their proposal to automatically detect anatomical landmarks from projective
images, based on in silico training, has since gained ground in MIS applications.
5.2. MC-GPU
Monte Carlo simulation enables realistic, physics-based simulation of radiation
transport, but requires significant compute time. Developed with the aim of
accelerating simulation of X-ray and CT images, MC-GPU features massively parallel
Monte Carlo simulation of photon interactions with volumetric patient models,
leveraging advancements in graphical processing units (GPUs) [111, 207]. Compared
to single-core CPU execution, MC-GPU achieves a maximum 27-fold speedup in
simulation time, using hardware available in 2009. Despite these advantages, MC-
GPU still requires sufficient compute time to be impractical for generating datasets
on the scale needed for training deep learning algorithms.
5.3. DeepDRR
To catalyze in silico training in the X-ray domain, Unberath et al. [44, 112] contribute
a Python framework for fast, physics-based DRR synthesis with sufficient realism
for sim-to-real transfer. While previous approaches that focused on image realism
employed Monte Carlo simulation of photon absorption and scatter, DeepDRR
approximates these effects in a physically realistic manner by projecting through
segmented CT volumes and subsequently estimating photon scatter with a DNN,
trained on Monte Carlo ground truth, which enables generation of tens of thousands
of images in a matter of hours [44]. Separately, the popularity of Python in

the machine learning community, due to open source tools like PyTorch [156]
and TensorFlow [157], makes DeepDRR an appealing research tool with a ready
community for quickly generating synthetic X-ray datasets. Like previous DRR
methods, DeepDRR relies on patient CT to estimate the material density and
attenuation properties, with segmentations of air, soft tissue, and bone achieved
in a patch-wise manner with a V-Net [208]. This allows for attenuation modeling
based on a realistic X-ray spectrum, compared to single material, mono-energetic
DRRs. They demonstrate the effectiveness of this increased realism by training a
DNN to detect anatomical landmarks of the pelvis in real images, which was not
found to be possible when using conventional DRRs for in silico training, due to the
sim-to-real gap.
The original pelvis landmark annotation task in Unberath et al. [44] continues
to be a major application of this research, since landmarks can be used to initialize a
variety of 2D/3D registration problems through feature-based methods, supporting
percutaneous approaches [194]. Considering arbitrary viewpoints, Bier et al. [209]
find that in silico training with DeepDRR enabled successful 2D/3D registration
on real radiographs [26, 209]. For the same task, [210] develop view-dependent
weights for anatomical landmark detection in the pelvis, supporting registration with
a success rate of 86.8% and error of 4.2 ± 3.9 mm.
Further work on 2D/3D registration of the pelvis propose alternative
initialization strategies to landmark detection. Gu et al. [211] propose a learning-
based approach to estimating the transformation between two projections, creating
a training set of 21,555 DRRs from 5 high-resolution CTs from NIH Cancer Imaging
Archive [212]. Their approach extends the capture range of conventional 2D/3D
registration, removing the need for careful initialization. Alternatively, [213] train a
DNN to detect contours in pelvic radiographs, increasing the robustness of contour-
based 2D/3D registration and achieving best-base error of 1.02 mm.
In X-ray guided MIS, quantitatively assessing tool-to-tissue spatial relationships
from radiographs can be difficult for featureless or flexible tools, such as K-
wires or continuum robots [214]. To support this task, Unberath et al. [112]
extend DeepDRR to support modeling of surgical tools, specifically robotic end
effectors, in DRRs, along with simultaneous keypoint localization and segmentation.
Individual components of a flexible continuum manipulator are projected to provide
segmentation ground truth [214]. [215] extend this paradigm by synthesizing DRRs
from the same view with and without metal implants in the spine, enabling a
DNN to inpaint radiographs and remove metal implants. This can improve 2D/3D
registration of the spine, wherein the sharp contrast of metal implants tends to inhibit
intensity-based registration methods.
Building on physics-based DRR synthesis, Toth et al. [216] show that domain
randomization (DR) can improve DRR-to-X-ray performance for cardiac 2D/3D
registration. They train an RL agent to iteratively update a cardiac model to align
with real radiographs, demonstrating higher stability when DR is applied durign
training. The purpose of DR, which applies unrealistic image transformations to
the training set, is to apply such large variation that the DNN avoids local, domain-
specific minima in the loss function. Previously mentioned work in the pelvis likewise
uses have similarly utilized DR for fully automatic 2D/3D registration in the pelvis,
based on anatomical landmark detection [210].
The advantages of domain randomization are precisely proved in Gao et al. [217],
which shows that the combination of physics-based X-ray synthesis, using DeepDRR,
combined with strong DR are comparable to GAN-based domain adaptation, and
they outperform GAN-based domain adaptation with conventional DRRs, although
this work is not yet peer-reviewed. This is advantageous because the image
transformations involved in “strong DR,” such as image inversion, blurring, warping,
and coarse dropout, among others, are computationally inexpensive, whereas GANs
require additional training with sufficient real images as an unlabeled reference.
They demonstrate this approach, coined “SyntheX,” on three representative tasks:
pelvic landmark detection for 2D/3D registration, detection and segmentation of a
continuum manipulator, and COVID-19 diagnosis from chest X-rays [217].
Inspired by [205], Kausch et al. [18] propose an intelligent system for automatic
C-arm repositioning, regressing a pose update based on intermediate landmark
detection to obtain standard views of the pelvis. Without additional fine-tuning,
their system obtained desired clinical views when evaluated on real X-rays, to within
inter-rater variability [18]. They refine this work in the domain of spine surgery,
introducing K-wire and pedicle screw implants as augmentations to the training set
[218]. K-wire simulation was accomplished through post-processing of the X-ray
image with quadratic Bézier curves, while pedicle screws were simulated in the DRR
synthesis process similar to Unberath et al. [112]. Related work in this area uses a
DNN to estimate the C-arm pose in simulation [219].
Recent applications of in silico training with DeepDRR continue to focus on
minimally invasive orthopedic procedures. In order to anticipate complications

during percutaneous pelvic fixation, [220] train a DNN to assess the positioning
of a K-wire with respect to safe corridors through bone. Although they do not
demonstrate sim-to-real performance, their approach outlines a roadmap to detecting
unfavorable K-wire trajectories through the superior pubic ramus and potentially
providing CT-free guidance for pelvic fixation. Separately, Sukesh et al. [221] detect
bounding boxes of vertebral bodies in 2D X-rays, demonstrating the advantage of
adding synthetic X-rays over purely real-image training.

Despite their drawbacks with regard to realism, conventional DRRs have been
used for in silico training, and additional frameworks rely on these tools due to
their simplicity, especially in combination with other sim-to-real techniques. For
example, Dhont et al. [222] propose combining conventional DRRs with GANs [86]
to synthesize photorealistic DRR, demonstrating the performance of their RealDRR
framework in terms of quantitative image similarity. Utilizing this approach for
in silico training, Zhou et al. [223] target automatic landmark detection on cranial
fluoroscopic images, for the purpose of 2D/3D registration. They overcome the sim-
to-real gap using a GAN with real radiographs as the target domain [86].
Sometimes, though, the sim-to-real gap is small enough that additional
techniques are not necessary. This is perhaps the case when modeling metal
tools, which are single-material, homogeneous objects and align well with the
assumptions of conventional DRRs. Considering this, Kügler et al. [224] demonstrate
that conventional DRRs can facilitate automatic 2D/3D registration of surgical
instruments, including screws, drills, and robot components, by recognizing
pseudolandmarks similar to [209]. Their approach, coined i3PosNet, generalizes from
training with DRRs to real X-ray images with no extra steps to achieve registration
errors less than 0.05 mm.
Recently, alternatives to volumetric and sparse data representations have
been proposed, using multi-layer perceptrons to represent scenes for rendering
purposes [225] Huy and Quan [226] propose apply this methodology to X-ray
simulation, proposing Neural Radiance Projection (NeRP) to produce “variationally
reconstructed radiographs” (VRRs). In this approach, a differentiable renderer allows
gradients to backpropagate through the projection step so that the patient volume
is learned based on real X-rays. As in RealDRR and XPGAN [227], they use a GAN
[86] to improve image realism as a final step. Although NeRPs are only used for
in silico training of a diagnostic intelligent system, this simulation framework has
potential applications in MIS, where novel view synthesis is a desirable capability. A
framework for their approach is not yet available.
To minimize radiation dose during CT imaging, Abadi et al. [228] propose a
volumetric, Monte Carlo simulation framework with device- and institution-specific
parameters for imaging, called DukeSim. Although intended for CT simulation,
DukeSim’s indidividual projections have been used in combination with voxelized
statistical shape models [82] to reduce variability in X-ray imaging due to exactly
these factors, which can affect the consistency and reliability of diagnostic imaging
[229].
Finally, although it is not specifically developed with machine learning in
mind, DRRGenerator [113] may yet be of interest to the community because of
its intuitive user interface and integration with the open-source medical imaging
software, 3D Slicer [108]. Currently, the popular DeepDRR tool requires users to
develop a sampling strategy of sufficiently varied views in order to guarantee view-
invariant DNN performance, as in [209]. With additional capabilities focused on
in silico training, DRRGenerator would be to X-ray image-guided interventions as
VisionBlender and AMBF+ are to endoscopic and stereo microscopic image guided
procedures, respectively.
6. Additional Imaging Modalities
Much of the interest in in silico training for MIS has focused on 2D imaging
modalities, endoscopy, US, and X-ray. We speculate that this is because large
datasets suitable for training DNNs are easier to obtain in the 2D domain, where
a single annotation of a 3D image can be propogated to thousands of samples.
Nevertheless, simulating 3D images and 3D physical interactions is of interest to
develop intelligent systems focused on intra-operative CT, MRI, and 3D US.
Toward this end, Lee et al. [230] propose a simulated environment for training
autonomous needle insertion robots, using RL. They model the deformation of a
beveled tip needle in a dynamic environment based on stochastic processes, providing
negative rewards for collisions with obstacles such as bone, and positive rewards
when it reachs the biopsy target. In the future, physics-based simulation of needle
insertion may provide simulation for both CT-guided or, through a platform like
DeepDRR [44], X-ray guided needle insertion.
CT guidance must contend with severe image artifacts introduced by metal
implants, such as pedicle screws. In order to minimize metal artifacts in intra-
operative cone-beam CT (CBCT), Zaech et al. [231], Thies et al. [232] train a DNN
to adjust C-arm trajectories during image acquisition, using DeepDRR for in silico
training. Their method iteratively adjusts the out-of-plane angle of a robotic C-arm
to avoid “poor images” characterized by beam hardening, photon starvation, and
noise.
Recently, image-free guidance for MIS has been explored. Årsvold et al.
[233] simulate electrical properties of target tissue types in order to train an
intelligent system for minimally invasive lymphadenecomy, a surgical treatment for
cancer. Their determines whether a lymph node is present underneath an electrical
impedance scanner, which can be deployed as part of a robotic assisted MIS system,
achieving 93.49% accuracy in ex vivo tissue phantom experiments.
7. Discussion and Conclusion
There is significant potential for in silico training to produce the next generation
of MIS systems, deploying artificial intelligence to assist providers in alleviating the
inherent challenges of minimally invasive approaches, in particular by improving the
acquisition and interpretation of intra-operative images. In silico simulation provides
a training ground limited only by the constraints of generating realistic-looking
images and the existing techniques therefor. In principle, as data representations
and physics-based simulations continue to mature, in silico training can expand
to include near limitless experience for learning-based algorithms, from supervised
learning to reinforcement learning, by providing rich annotations from the perfectly
controlled virtual environment.
One notable constraint on in silico simulation for MIS is the availability of
patient data on which to base simulated images and interactions. Much of this review
has focused on 2D imaging modalities for exactly this reason, since the generation
of endoscopic, X-ray, or ultrasound images allows for hundreds or thousands of
training samples to originate from a single patient model [44, 98, 162]. For example,
DRRs vary widely in visual appearance based on the position and orientation of
the virtual C-arm, and further techniques such as domain randomization increase
the variance of training data to further improve sim-to-real transfer [210, 216, 217].
However, existing techniques for generating these realistic-looking images rely on
3D patient models based on patient data, including CT, MRI, or prior endoscopic
reconstructions, for example. This introduces a potential bottleneck to in silico
training, where the long tailed distribution of real-world situations presents too much
variation for large but still finite data to train models with sufficient generalizability.
Moreover, simulation of 3D intra-operative images, such as CT, MRI, and 3D US,
must rely on existing digital phantoms such as XCAT [82], which tend to produce
images with lower variation and realism.
As previously discussed, generalizability differs from sim-to-real domain
adaptation in that it is concerned not just with image appearance or tissue
characteristics, for example, but with any number of variations that may arise
in the course of surgery, where anomalous anatomy and complications produce
images outside the training domain. Imaging techniques, surgical setups, and patient
demographics vary significantly from one institution to another so it is challenging
to train AI models which are resilient to such variations [87, 234]. The reliance on
finite patient data underlying current in silico simulation methods implies that not
all patient variation can be represented, and unseen conditions, such as the presence
of foreign surgical instruments from a manufacturer not considered during training,
may result in deteriorated performance.
An integrated physics engine is a crucial way that in silico training platforms
can increase the utility of each patient model, introducing variation based on tissue
deformation and tool-to-tissue physical interaction. In the visible light domain, a
great deal of attention has focused on simulation for robotic laparoscopy, including
complex interactions like suturing [99] and camera manipulation [29]. These rely
either on hand-built virtual environments or patient-based reconstructions, which
enable more realistic image formation [98]. In order to realize the full potential of in
silico training beyond the visible light domain, there is a need to develop simulation
frameworks for image formation of CT, MRI, X-ray, and ultrasound based on sparse
data representations that are conducive to physics engine modeling. Leveraging the
existing physics and robotic modeling capabilities of simulators like AMBF [45] will
require developing rendering frameworks like DeepDRR [44] (X-ray) and Field II
[101] (ultrasound) that produce images capable of overcoming the sim-to-real gap
for each modality, although these existing frameworks for non-optical simulation
rely on volumetric methods to produce realistic images. Since converting between
REFERENCES 29
volumetric and sparse virtual object models is computationally prohibitive, advances

in simulation techniques are required to generate realistic images from sparse models.
Aside from simulating physics-based tool-to-tissue interactions, the next
generation of in silico simulation frameworks should overcome the dependence
on finite CT or MRI scans by generating realistic patient models with specific
pathologies, demographics, and characteristics, rather than selecting or adapting
existing images. In this review, we have mostly discussed generative models as one
way to overcome the sim-to-real gap, using GANs [86], but such models can also be
used to generate realistic CT or MRI images [235]. Conditioning generative models
based on desired patient properties would enable large-scale, fully virtual cohort
sampling. If designed so as to be compatible with physics engines as discussed
above, this level of sophistication would enable in silico training to acquire vastly
more virtual experience that can be transferred to real images than would be possible
based on finite real images or conventional simulation.
Overall, the opportunities for innovation in in silico training and its application
to MIS constitute an exciting area of inquiry, especially as machine learning
algorithms continue to mature at an impressive rate. Based on the rapid pace
of current progress, in fact, future reviews re-examining in silico training may be
necessary before too long, in order to describe the next generation of intelligent
surgical systems that arise from currently accelerating advances in this key enabling
technology.
Disclosures
This work was supported by the NIH under Grant No. R21EB028505 and Johns
Hopkins University Internal Funds.
References
[1] T. Haidegger, S. Speidel, D. Stoyanov, and R. M. Satava, “Robot-assisted

minimally invasive surgery—surgical robotics in the data age,” Proceedings of
the IEEE, vol. 110, no. 7, pp. 835–846, 2022.
[2] R. J. Landreneau, R. J. Wiechmann, S. R. Hazelrigg, M. J. Mack,
R. J. Keenan, and P. F. Ferson, “Effect of minimally invasive thoracic
surgical approaches on acute and chronic postoperative pain,” Chest Surg.
REFERENCES 30
Clin. N. Am., vol. 8, no. 4, pp. 891–906, Nov. 1998. [Online]. Available:
https://pubmed.ncbi.nlm.nih.gov/9917931
[3] M. Wong, S. Morris, K. Wang, and K. Simpson, “Managing Postoperative
Pain After Minimally Invasive Gynecologic Surgery in the Era of the Opioid
Epidemic,” J. Minim. Invasive Gynecol., vol. 25, no. 7, pp. 1165–1178, Nov.
2018.
[4] K. Mohiuddin and S. J. Swanson, “Maximizing the benefit of minimally
invasive surgery,” J. Surg. Oncol., vol. 108, no. 5, pp. 315–319, Oct. 2013.
[5] R. E. Goldstein, L. Blevins, D. Delbeke, and W. H. Martin, “Effect of
Minimally Invasive Radioguided Parathyroidectomy on Efficacy, Length of
Stay, and Costs in the Management of Primary Hyperparathyroidism,” Ann.
Surg., vol. 231, no. 5, p. 732, May 2000.
[6] T. Tarin, A. Feifer, S. Kimm, L. Chen, D. Sjoberg, J. Coleman, and P. Russo,
“Impact of a Common Clinical Pathway on Length of Hospital Stay in Patients
Undergoing Open and Minimally Invasive Kidney Surgery,” J. Urol., vol. 191,
no. 5, pp. 1225–1230, May 2014.
[7] T. Cheng, T. Liu, G. Zhang, X. Peng, and X. Zhang, “Does minimally
invasive surgery improve short-term recovery in total knee arthroplasty?” Clin.
Orthop., vol. 468, no. 6, pp. 1635–1648, Jun. 2010.
[8] G. M. Jonsdottir, S. Jorgensen, S. L. Cohen, K. N. Wright, N. T. Shah,
N. Chavan, and J. I. Einarsson, “Increasing Minimally Invasive Hysterectomy:
Effect on Cost and Complications,” Obstet. Gynecol., vol. 117, no. 5, pp. 1142–
1149, May 2011.
[9] M. Gatz, A. Driessen, J. Eschweiler, M. Tingart, and F. Migliorini, “Open
versus minimally-invasive surgery for Achilles tendon rupture: a meta-analysis
study,” Arch. Orthop. Trauma Surg., vol. 141, no. 3, pp. 383–401, Mar. 2021.
[10] N. Ahmidi, G. D. Hager, L. Ishii, G. L. Gallia, and M. Ishii, “Robotic Path
Planning for Surgeon Skill Evaluation in Minimally-Invasive Sinus Surgery,”
in Medical Image Computing and Computer-Assisted Intervention – MICCAI
2012. Berlin, Germany: Springer, 2012, pp. 471–478.
[11] R. C. Setliff, “Minimally Invasive Sinus Surgery: The Rationale and the
Technique,” Otolaryngol. Clin. North Am., vol. 29, no. 1, pp. 115–129, Feb.
1996.
REFERENCES 31
[12] D. Burschka, J. J. Corso, M. Dewan, W. Lau, M. Li, H. Lin, P. Marayong,

N. Ramey, G. D. Hager, B. Hoffman, D. Larkin, and C. Hasser, “Navigating
inner space: 3-D assistance for minimally invasive surgery,” Rob. Auton. Syst.,
vol. 52, no. 1, pp. 5–26, Jul. 2005.
[13] N. Simaan, R. M. Yasin, and L. Wang, “Medical Technologies and Challenges
of Robot-Assisted Minimally Invasive Intervention and Diagnostics,” Annu.
Rev. Control Rob. Auton. Syst., vol. 1, no. 1, pp. 465–490, May 2018.
[14] A. N. Johnson, J. S. Peiffer, N. Halmann, L. Delaney, C. A. Owen, and J. Hersh,
“Ultrasound-Guided Needle Technique Accuracy: Prospective Comparison of
Passive Magnetic Tracking Versus Unassisted Echogenic Needle Localization,”
Reg. Anesth. Pain Med., vol. 42, no. 2, p. 223, Mar. 2017.
[15] N. Abolhassani, R. Patel, and M. Moallem, “Needle insertion into soft tissue:
A survey,” Med. Eng. Phys., vol. 29, no. 4, pp. 413–431, May 2007.
[16] M. T. Graham, J. Huang, F. X. Creighton, and M. A. Lediju Bell, “Simulations
and human cadaver head studies to identify optimal acoustic receiver locations
for minimally invasive photoacoustic-guided neurosurgery,” Photoacoustics,
vol. 19, p. 100183, Sep. 2020.
[17] B. D. Killeen, J. Winter, W. Gu, A. Martin-Gomez, R. H. Taylor, G. Osgood,
and M. Unberath, “Mixed reality interfaces for achieving desired views with
robotic X-ray systems,” Computer Methods in Biomechanics and Biomedical
Engineering: Imaging & Visualization, pp. 1–6, Dec. 2022.
[18] L. Kausch, S. Thomas, H. Kunze, M. Privalov, S. Vetter, J. Franke, A. H.
Mahnken, L. Maier-Hein, and K. Maier-Hein, “Toward automatic C-arm
positioning for standard projections in orthopedic surgery,” Int. J. CARS,
vol. 15, no. 7, pp. 1095–1105, Jul. 2020.
[19] L. Kausch, S. Thomas, H. Kunze, T. Norajitra, A. Klein, L. Ayala,
J. El Barbari, E. Mandelka, M. Privalov, S. Vetter, A. Mahnken, L. Maier-
Hein, and K. Maier-Hein, “C-arm positioning for standard projections during
spinal implant placement,” Med. Image Anal., vol. 81, p. 102557, Oct. 2022.
[20] S. A. Riley, “Radiation Exposure From Fluoroscopy During Orthopedic
Surgical Procedures,” Clinical Orthopaedics and Related Research®,
vol. 248, pp. 257–260, Nov. 1989. [Online]. Available: https:
//journals.lww.com/clinorthop/Abstract/1989/11000/Radiation_Exposure_
REFERENCES 32
From_Fluoroscopy_During.41.aspx?casa_token=hG4x0Phiyd4AAAAA:
1DoMNmVpEvw0I2nbtfolpOHuD4jJd7qHKnqCSrsLeUYEd-TwZN3kYhYDT6bRw94I18Vg77pw
[21] N. Theocharopoulos, K. Perisinakis, J. Damilakis, G. Papadokostakis,
A. Hadjipavlou, and N. Gourtsoyiannis, “Occupational Exposure from
Common Fluoroscopic Projections Used in Orthopaedic Surgery,” JBJS,
vol. 85, no. 9, pp. 1698–1703, Sep. 2003. [Online]. Available: https://
journals.lww.com/jbjsjournal/fulltext/2003/09000/occupational_exposure_
from_common_fluoroscopic.7.aspx?casa_token=Vsl1Btk05L0AAAAA:
-PhS9duGvTpxiknHIo80d3kFSMSSrk3qmUK8yWCW8OKhaMPfYMvUbA9G3mW8mDozYv8B
[22] J. L. Cook, J. L. Tomlinson, and A. L. Reed, “Fluoroscopically Guided Closed
Reduction and Internal Fixation of Fractures of the Lateral Portion of the
Humeral Condyle: Prospective Clinical Study of the Technique and Results in
Ten Dogs,” Vet. Surg., vol. 28, no. 5, pp. 315–321, Sep. 1999.
[23] J. Sándor, B. Lengyel, T. Haidegger, G. Saftics, G. Papp, Á. Nagy, and
G. Wéber, “Minimally invasive surgical technologies: Challenges in education
and training,” Asian Journal of Endoscopic Surgery, vol. 3, no. 3, pp. 101–108,
Aug. 2010.
[24] R. H. Taylor, N. Simaan, A. Menciassi, and G.-Z. Yang, “Surgical robotics and
computer-integrated interventional medicine,” Proceedings of the IEEE, vol.
110, no. 7, pp. 823–834, 2022.
[25] L. Chen, W. Tang, N. W. John, T. R. Wan, and J. J. Zhang, “De-smokeGCN:
Generative Cooperative Networks for Joint Surgical Smoke Detection and
Removal,” IEEE Trans. Med. Imaging, vol. 39, no. 5, pp. 1615–1625, Nov.
2019.
[26] B. Bier, F. Goldmann, J.-N. Zaech, J. Fotouhi, R. Hegeman, R. Grupp,
M. Armand, G. Osgood, N. Navab, A. Maier, and M. Unberath, “Learning
to detect anatomical landmarks of the pelvis in X-rays from arbitrary views,”
Int. J. CARS, vol. 14, no. 9, pp. 1463–1473, Sep. 2019.
[27] K. Li, Y. Xu, J. Wang, D. Ni, L. Liu, and M. Q.-H. Meng, “Image-Guided
Navigation of a Robotic Ultrasound Probe for Autonomous Spinal Sonography
Using a Shadow-Aware Dual-Agent Framework,” IEEE Transactions on
Medical Robotics and Bionics, vol. 4, no. 1, pp. 130–144, Nov. 2021.
[28] A. Munawar, Z. Li, P. Kunjam, N. Nagururu, A. S. Ding, P. Kazanzides,
T. Looi, F. X. Creighton, R. H. Taylor, and M. Unberath, “Virtual reality
REFERENCES 33
for synergistic surgical training and data generation,” Computer Methods in

Biomechanics and Biomedical Engineering: Imaging & Visualization, vol. 10,
no. 4, pp. 366–374, Jul. 2022.
[29] J. Xu, B. Li, B. Lu, Y.-H. Liu, Q. Dou, and P.-A. Heng, “SurRoL: An Open-
source Reinforcement Learning Centered and dVRK Compatible Platform for
Surgical Robot Learning,” in 2021 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS). IEEE, Sep. 2021, pp. 1821–1828.
[30] P. Hamet and J. Tremblay, “Artificial intelligence in medicine,” Metabolism,
vol. 69, pp. S36–S40, Apr. 2017.
[31] P. Fiorini, K. Y. Goldberg, Y. Liu, and R. H. Taylor, “Concepts and trends in
autonomy for robot-assisted surgery,” Proceedings of the IEEE, vol. 110, no. 7,
pp. 993–1011, 2022.
[32] A. Hosny, C. Parmar, J. Quackenbush, L. H. Schwartz, and H. J. W. L. Aerts,
“Artificial intelligence in radiology,” Nat. Rev. Cancer, vol. 18, no. 8, pp. 500–
510, Aug. 2018.
[33] M. Reyes, R. Meier, S. Pereira, C. A. Silva, F.-M. Dahlweid,
H. von Tengg-Kobligk, R. M. Summers, and R. Wiest, “On the
Interpretability of Artificial Intelligence in Radiology: Challenges and
Opportunities,” Radiology: Artificial Intelligence, May 2020. [Online].
Available: https://pubs.rsna.org/doi/full/10.1148/ryai.2020190043
[34] T. Vercauteren, M. Unberath, N. Padoy, and N. Navab, “CAI4CAI: The Rise of
Contextual Artificial Intelligence in Computer-Assisted Interventions,” Proc.
IEEE, vol. 108, no. 1, pp. 198–214, Oct. 2019.
[35] M. Unberath, C. Gao, Y. Hu, M. Judish, R. H. Taylor, M. Armand, and
R. Grupp, “The impact of machine learning on 2d/3d registration for image-
guided interventions: A systematic review and perspective,” Frontiers in
Robotics and AI, vol. 8, p. 716007, 2021.
[36] R. B. Grupp, M. Unberath, C. Gao, R. A. Hegeman, R. J. Murphy, C. P.
Alexander, Y. Otake, B. A. McArthur, M. Armand, and R. H. Taylor,
“Automatic annotation of hip anatomy in fluoroscopy for robust and efficient
2d/3d registration,” International journal of computer assisted radiology and
surgery, vol. 15, no. 5, pp. 759–769, 2020.
[37] E. Colleoni, P. Edwards, and D. Stoyanov, “Synthetic and Real Inputs for Tool
REFERENCES 34
Segmentation in Robotic Surgery,” in Medical Image Computing and Computer

Assisted Intervention – MICCAI 2020. Cham, Switzerland: Springer, Sep.
2020, pp. 700–710.
[38] H. Ding, J. Zhang, P. Kazanzides, J. Y. Wu, and M. Unberath, “CaRTS:
Causality-Driven Robot Tool Segmentation from Vision and Kinematics Data,”
in Medical Image Computing and Computer Assisted Intervention – MICCAI
2022. Cham, Switzerland: Springer, Sep. 2022, pp. 387–398.
[39] A. Uneri, T. De Silva, J. W. Stayman, G. Kleinszig, S. Vogt, A. J. Khanna,
Z. L. Gokaslan, J.-P. Wolinsky, and J. H. Siewerdsen, “Known-component 3D–
2D registration for quality assurance of spine surgery pedicle screw placement,”
Phys. Med. Biol., vol. 60, no. 20, p. 8007, Sep. 2015.
[40] A. Uneri, T. De Silva, J. Goerres, M. W. Jacobson, M. D. Ketcha,
S. Reaungamornrat, G. Kleinszig, S. Vogt, A. J. Khanna, G. M. Osgood, J.-P.
Wolinsky, and J. H. Siewerdsen, “Intraoperative evaluation of device placement
in spine surgery using known-component 3D–2D image registration,” Phys.
Med. Biol., vol. 62, no. 8, p. 3330, Mar. 2017.
[41] W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-Real Transfer in Deep
Reinforcement Learning for Robotics: a Survey,” in 2020 IEEE Symposium
Series on Computational Intelligence (SSCI). IEEE, Dec. 2020, pp. 737–744.
[42] E. Salvato, G. Fenu, E. Medvet, and F. A. Pellegrino, “Crossing the Reality
Gap: A Survey on Sim-to-Real Transferability of Robot Controllers in
Reinforcement Learning,” IEEE Access, vol. 9, pp. 153 171–153 187, Nov. 2021.
[43] P. Kaur, S. Taghavi, Z. Tian, and W. Shi, “A Survey on Simulators for Testing
Self-Driving Cars,” in 2021 Fourth International Conference on Connected and
Autonomous Driving (MetroCAD). IEEE, Apr. 2021, pp. 62–70.
[44] M. Unberath, J.-N. Zaech, S. C. Lee, B. Bier, J. Fotouhi, M. Armand, and
N. Navab, “DeepDRR – A Catalyst for Machine Learning in Fluoroscopy-
Guided Procedures,” in Medical Image Computing and Computer Assisted
Intervention – MICCAI 2018. Cham, Switzerland: Springer, Sep. 2018, pp.
98–106.
[45] A. Munawar, Y. Wang, R. Gondokaryono, and G. S. Fischer, “A Real-
Time Dynamic Simulator and an Associated Front-End Representation Format
for Simulating Complex Robots and Environments,” in 2019 IEEE/RSJ
REFERENCES 35
International Conference on Intelligent Robots and Systems (IROS). IEEE,

Nov. 2019, pp. 1875–1882.
[46] A. Munawar, Z. Li, P. Kunjam, N. Nagururu, A. S. Ding, P. Kazanzides,
T. Looi, F. X. Creighton, R. H. Taylor, and M. Unberath, “Virtual reality
for synergistic surgical training and data generation,” Computer Methods in
Biomechanics and Biomedical Engineering: Imaging & Visualization, pp. 1–9,
2021.
[47] I. Radi, J. C. Tellez, R. E. Alterio, D. J. Scott, G. Sankaranarayanan, M. B.
Nagaraj, M. E. Hogg, H. J. Zeh, and P. M. Polanco, “Feasibility, effectiveness
and transferability of a novel mastery-based virtual reality robotic training
platform for general surgery residents,” Surg. Endosc., vol. 36, no. 10, pp.
7279–7287, Oct. 2022.
[48] J. Collins, S. Chand, A. Vanderkop, and D. Howard, “A Review of Physics
Simulators for Robotic Applications,” IEEE Access, vol. 9, pp. 51 416–51 431,
Mar. 2021.
[49] J. M. Prendergast and M. E. Rentschler, “Towards autonomous motion control
in minimally invasive robotic surgery,” Expert Rev. Med. Devices, vol. 13, no. 8,
pp. 741–748, Aug. 2016.
[50] M. Tonutti, D. S. Elson, G.-Z. Yang, A. W. Darzi, and M. H. Sodergren,
“The role of technology in minimally invasive surgery: state of the art, recent
developments and future directions,” Postgrad. Med. J., vol. 93, no. 1097, pp.
159–167, Mar. 2017.
[51] M. Siddaiah-Subramanya, K. W. Tiang, and M. Nyandowe, “A New Era of
Minimally Invasive Surgery: Progress and Development of Major Technical
Innovations in General Surgery Over the Last Decade,” Surg. J. (N Y), vol. 03,
no. 04, pp. e163–e166, Oct. 2017.
[52] A. M. Okamura, “HAPTICS IN ROBOT-ASSISTED MINIMALLY INVA-
SIVE SURGERY,” in The Encyclopedia of Medical Robotics. Singapore:
WORLD SCIENTIFIC, Aug. 2018, pp. 317–339.
[53] N. Bandari, J. Dargahi, and M. Packirisamy, “Tactile Sensors for Minimally
Invasive Surgery: A Review of the State-of-the-Art, Applications, and
Perspectives,” IEEE Access, vol. 8, pp. 7682–7708, Dec. 2019.
[54] L. Qian, J. Y. Wu, S. P. DiMaio, N. Navab, and P. Kazanzides, “A Review
REFERENCES 36
of Augmented Reality in Robotic-Assisted Surgery,” IEEE Transactions on

Medical Robotics and Bionics, vol. 2, no. 1, pp. 1–16, Dec. 2019.
[55] P. Zaffino, S. Moccia, E. De Momi, and M. F. Spadea, “A Review on Advances
in Intra-operative Imaging for Surgery and Therapy: Imagining the Operating
Room of the Future,” Ann. Biomed. Eng., vol. 48, no. 8, pp. 2171–2191, Aug.
2020.
[56] F. B. Chioson, N. M. Espiritu, F. E. Munsayac, F. Jimenez, D. E. Lindo,
M. B. Santos, J. Reyes, L. J. A. F. Tan, R. C. R. Dajay, R. G. Baldovino, and
N. T. Bugtai, “Recent Advancements in Robotic Minimally Invasive Surgery:
A Review from the Perspective of Robotic Surgery in the Philippines,” in
2020 IEEE 12th International Conference on Humanoid, Nanotechnology,
Information Technology, Communication and Control, Environment, and
Management (HNICEM). IEEE, Dec. 2020, pp. 1–7.
[57] K. Li, Y. Xu, and M. Q.-H. Meng, “An Overview of Systems and Techniques for
Autonomous Robotic Ultrasound Acquisitions,” IEEE Transactions on Medical
Robotics and Bionics, vol. 3, no. 2, pp. 510–524, Apr. 2021.
[58] Ø. Bjelland, B. Rasheed, H. G. Schaathun, M. D. Pedersen, M. Steinert,
A. I. Hellevik, and R. T. Bye, “Toward a Digital Twin for Arthroscopic Knee
Surgery: A Systematic Review,” IEEE Access, vol. 10, pp. 45 029–45 052, Apr.
2022.
[59] Q. Huang and Z. Zeng, “A Review on Real-Time 3D Ultrasound Imaging
Technology,” Biomed Res. Int., vol. 2017, no. 6027029., p. ;, 2017.
[60] J. Marescaux and M. Diana, “Next step in minimally invasive surgery: hybrid
image-guided surgery,” J. Pediatr. Surg., vol. 50, no. 1, pp. 30–36, Jan. 2015.
[61] G. Fichtinger, J. Troccaz, and T. Haidegger, “Image-Guided Interventional
Robotics: Lost in Translation?” Proc. IEEE, vol. 110, no. 7, pp. 932–950, May
2022.
[62] P. Fiorini, K. Y. Goldberg, Y. Liu, and R. H. Taylor, “Concepts and Trends
in Autonomy for Robot-Assisted Surgery,” Proc. IEEE, vol. 110, no. 7, pp.
993–1011, Jun. 2022.
[63] K.-W. Kwok, H. Wurdemann, A. Arezzo, A. Menciassi, and K. Althoefer, “Soft
Robot-Assisted Minimally Invasive Surgery and Interventions: Advances and
Outlook,” Proc. IEEE, vol. 110, no. 7, pp. 871–892, May 2022.
REFERENCES 37
[64] J. Kim, M. de Mathelin, K. Ikuta, and D.-S. Kwon, “Advancement of Flexible

Robot Technologies for Endoluminal Surgeries,” Proc. IEEE, vol. 110, no. 7,
pp. 909–931, May 2022.
[65] Y. Kassahun, B. Yu, A. T. Tibebu, D. Stoyanov, S. Giannarou, J. H.
Metzen, and E. Vander Poorten, “Surgical robotics beyond enhanced dexterity
instrumentation: a survey of machine learning techniques and their role in
intelligent and autonomous surgical actions,” Int. J. CARS, vol. 11, no. 4, pp.
553–568, Apr. 2016.
[66] I. S. Alam, I. Steinberg, O. Vermesh, N. S. van den Berg, E. L. Rosenthal,
G. M. van Dam, V. Ntziachristos, S. S. Gambhir, S. Hernot, and S. Rogalla,
“Emerging Intraoperative Imaging Modalities to Improve Surgical Precision,”
Mol. Imaging Biol., vol. 20, no. 5, pp. 705–715, Oct. 2018.
[67] A. Maier, C. Syben, T. Lasser, and C. Riess, “A gentle introduction to deep
learning in medical image processing,” Z. Med. Phys., vol. 29, no. 2, pp. 86–101,
May 2019.
[68] I. Rivas-Blanco, C. J. Pérez-Del-Pulgar, I. García-Morales, and V. F. Muñoz,
“A Review on Deep Learning in Minimally Invasive Surgery,” IEEE Access,
vol. 9, pp. 48 658–48 678, Mar. 2021.
[69] T. D. Nagy and T. P. Haidegger, “Towards Standard Approaches for the
Evaluation of Autonomous Surgical Subtask Execution,” in 2021 IEEE 25th
International Conference on Intelligent Engineering Systems (INES). IEEE,
Jul. 2021, pp. 000 067–000 074.
[70] W. Zhao, L. Shen, M. T. Islam, W. Qin, Z. Zhang, X. Liang, G. Zhang, S. Xu,
and X. Li, “Artificial intelligence in image-guided radiotherapy: a review of
treatment target localization,” Quantitative Imaging in Medicine and Surgery,
vol. 11, no. 12, p. 4881, Dec. 2021.
[71] M. Santoro, S. Strolin, G. Paolani, G. Della Gala, A. Bartoloni, C. Giacometti,
I. Ammendolia, A. G. Morganti, and L. Strigari, “Recent Applications of
Artificial Intelligence in Radiotherapy: Where We Are and Beyond,” Appl.
Sci., vol. 12, no. 7, p. 3223, Mar. 2022.
[72] E. Nwoye, W. L. Woo, B. Gao, and T. Anyanwu, “Artificial Intelligence for
Emerging Technology in Surgery: Systematic Review and Validation,” IEEE
Rev. Biomed. Eng., pp. 1–22, Jun. 2022.
REFERENCES 38
[73] S. M. Hussain, A. Brunetti, G. Lucarelli, R. Memeo, V. Bevilacqua, and

D. Buongiorno, “Deep Learning Based Image Processing for Robot Assisted
Surgery: A Systematic Literature Survey,” IEEE Access, vol. 10, pp. 122 627–
122 657, Nov. 2022.
[74] M. Pfandler, M. Lazarovici, P. Stefan, P. Wucherer, and M. Weigl, “Virtual
reality-based simulators for spine surgery: a systematic review,” Spine J.,
vol. 17, no. 9, pp. 1352–1363, Sep. 2017.
[75] H. G. Guedes, Z. M. Câmara Costa Ferreira, L. Ribeiro de Sousa Leão, E. F.
Souza Montero, J. P. Otoch, and E. L. d. A. Artifon, “Virtual reality simulator
versus box-trainer to teach minimally invasive procedures: A meta-analysis,”
International Journal of Surgery, vol. 61, pp. 60–68, Jan. 2019.
[76] E. A. Patel, A. Aydın, A. Desai, P. Dasgupta, and K. Ahmed, “Current status
of simulation-based training in pediatric surgery: A systematic review,” J.
Pediatr. Surg., vol. 54, no. 9, pp. 1884–1893, Sep. 2019.
[77] A. J. Lungu, W. Swinkels, L. Claesen, P. Tu, J. Egger, and X. Chen, “A review
on the applications of virtual reality, augmented reality and mixed reality in
surgical simulation: an extension to different kinds of surgery,” Expert Rev.
Med. Devices, vol. 18, no. 1, pp. 47–62, Jan. 2021.
[78] A. Gangi, B. A. Kastler, and J. L. Dietemann, “Percutaneous vertebroplasty
guided by a combination of CT and fluoroscopy.” American Journal of
Neuroradiology, vol. 15, no. 1, pp. 83–86, Jan. 1994. [Online]. Available:
http://www.ajnr.org/content/15/1/83.short
[79] J. D. Barr, M. S. Barr, T. J. Lemley, and R. M. Mc-
Cann, “Percutaneous Vertebroplasty for Pain Relief and Spinal
Stabilization,” Spine, vol. 25, no. 8, pp. 923–928, Apr. 2000.
[Online]. Available: https://journals.lww.com/spinejournal/Fulltext/
2000/04150/Percutaneous_Vertebroplasty_for_Pain_Relief_and.5.aspx?
casa_token=5ejXmr4MSFsAAAAA:kqcfjyxP0fk-GtdEXakB5TxPuFdrE_
ugdOw-IycF5MjEstPZuAwIfQj3AYd2FJ_Miw-IAa_HPXpml9g_Zlu_VTje
[80] J. D. Opfermann, B. D. Killeen, C. Bailey, M. Khan, A. Uneri, K. Suzuki,
M. Armand, F. Hui, A. Krieger, and M. Unberath, “Feasibility of a Cannula-
Mounted Piezo Robot for Image-Guided Vertebral Augmentation: Toward a
Low Cost, Semi-Autonomous Approach,” in 2021 IEEE 21st International
REFERENCES 39
Conference on Bioinformatics and Bioengineering (BIBE). IEEE, Oct. 2021,

pp. 1–8.
[81] B. O. Community, Blender - a 3D modelling and rendering package, Blender
Foundation, Stichting Blender Foundation, Amsterdam, 2018. [Online].
Available: http://www.blender.org
[82] W. P. Segars, G. Sturgeon, S. Mendonca, J. Grimes, and B. M. W. Tsui,
“4D XCAT phantom for multimodality imaging research,” Med. Phys., vol. 37,
no. 9, pp. 4902–4915, Sep. 2010.
[83] R. A. Drebin, L. Carpenter, and P. Hanrahan, “Volume rendering,”
SIGGRAPH Comput. Graph., vol. 22, no. 4, p. 65–74, jun 1988. [Online].
Available: https://doi.org/10.1145/378456.378484
[84] A. Shirley, Fundamentals of Computer Graphics. Andover, England, UK:
Taylor & Francis, Jul. 2009.
[85] A. Maier, H. G. Hofmann, M. Berger, P. Fischer, C. Schwemmer, H. Wu,
K. Müller, J. Hornegger, J.-H. Choi, C. Riess, A. Keil, and R. Fahrig,
“CONRAD—A software framework for cone-beam imaging in radiology,” Med.
Phys., vol. 40, no. 11, p. 111914, Nov. 2013.
[86] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-
To-Image Translation Using Cycle-Consistent Adversarial Networks,”
pp. 2223–2232, 2017, [Online; accessed 30. Nov. 2022]. [Online].
Available: https://openaccess.thecvf.com/content_iccv_2017/html/Zhu_
Unpaired_Image-To-Image_Translation_ICCV_2017_paper.html
[87] D. Li, A. Kar, N. Ravikumar, A. F. Frangi, and S. Fidler, “Federated
Simulation for Medical Imaging,” in Medical Image Computing and Computer
Assisted Intervention – MICCAI 2020. Cham, Switzerland: Springer, Sep.
2020, pp. 159–168.
[88] A. Badano, A. Badal, S. Glick, C. G. Graff, F. Samuelson, D. Sharma, and
R. Zeng, “In silico imaging clinical trials for regulatory evaluation: initial
considerations for victre, a demonstration study,” in Medical Imaging 2017:
Physics of Medical Imaging, vol. 10132. SPIE, 2017, pp. 494–499.
[89] H. Saint-Jalmes, P.-A. Eliat, J. Bezy-Wendling, A. Bordelois, and
G. Gambarota, “Vip mri: virtual phantom magnetic resonance imaging,”
REFERENCES 40
Magnetic Resonance Materials in Physics, Biology and Medicine, vol. 27, pp.
419–424, 2014.
[90] N. B. Semmineh, A. M. Stokes, L. C. Bell, J. L. Boxerman, and C. C. Quarles,
“A population-based digital reference object (dro) for optimizing dynamic
susceptibility contrast (dsc)-mri methods for clinical trials,” Tomography,
vol. 3, no. 1, pp. 41–49, 2017.
[91] W. Kainz, E. Neufeld, W. E. Bolch, C. G. Graff, C. H. Kim, N. Kuster,
B. Lloyd, T. Morrison, P. Segars, Y. S. Yeom et al., “Advances in computational
human phantoms and their applications in biomedical engineering—a topical
review,” IEEE transactions on radiation and plasma medical sciences, vol. 3,
no. 1, pp. 1–23, 2018.
[92] G. Cloutier, G. Soulez, S. D. Qanadli, P. Teppaz, L. Allard, Z. Qin, F. Cloutier,
and L.-G. Durand, “A multimodality vascular imaging phantom with fiducial
markers visible in dsa, cta, mra, and ultrasound,” Medical Physics, vol. 31,
no. 6, pp. 1424–1433, 2004.
[93] B. Driscoll, H. Keller, and C. Coolens, “Development of a dynamic flow imaging
phantom for dynamic contrast-enhanced ct,” Medical physics, vol. 38, no. 8,
pp. 4866–4880, 2011.
[94] A. Shukla-Dave, N. A. Obuchowski, T. L. Chenevert, S. Jambawalikar,
L. H. Schwartz, D. Malyarenko, W. Huang, S. M. Noworolski, R. J. Young,
M. S. Shiroishi et al., “Quantitative imaging biomarkers alliance (qiba)
recommendations for improved precision of dwi and dce-mri derived biomarkers
in multicenter oncology trials,” Journal of Magnetic Resonance Imaging,
vol. 49, no. 7, pp. e101–e121, 2019.
[95] C. Wu, D. A. Hormuth II, T. Easley, V. Eijkhout, F. Pineda, G. S. Karczmar,
and T. E. Yankeelov, “An in silico validation framework for quantitative dce-
mri techniques based on a dynamic digital phantom,” Medical Image Analysis,
vol. 73, p. 102186, 2021.
[96] S. Marchesseau, T. Heimann, S. Chatelin, R. Willinger, and H. Delingette,
“Fast porous visco-hyperelastic soft tissue model for surgery simulation:
Application to liver surgery,” Prog. Biophys. Mol. Biol., vol. 103, no. 2, pp.
185–196, Dec. 2010.
[97] J. Y. Wu, A. Munawar, M. Unberath, and P. Kazanzides, “Learning Soft-Tissue
REFERENCES 41
Simulation from Models and Observation,” in 2021 International Symposium

on Medical Robotics (ISMR). IEEE, Nov. 2021, pp. 1–7.
[98] J. Cartucho, S. Tukra, Y. Li, D. S. Elson, and S. Giannarou, “VisionBlender:
a tool to efficiently generate computer vision datasets for robotic surgery,”
Computer Methods in Biomechanics and Biomedical Engineering: Imaging &
Visualization, vol. 9, no. 4, pp. 331–338, Jul. 2021.
[99] A. Munawar, J. Y. Wu, G. S. Fischer, R. H. Taylor, and P. Kazanzides, “Open
Simulation Environment for Learning and Practice of Robot-Assisted Surgical
Suturing,” IEEE Rob. Autom. Lett., vol. 7, no. 2, pp. 3843–3850, Jan. 2022.
[100] V. M. Varier, D. K. Rajamani, F. Tavakkolmoghaddam, A. Munawar, and G. S.
Fischer, “AMBF-RL: A real-time simulation based Reinforcement Learning
toolkit for Medical Robotics,” in 2022 International Symposium on Medical
Robotics (ISMR). IEEE, Apr. 2022, pp. 1–8.
[101] J. A. Jensen and N. B. Svendsen, “Calculation of pressure fields from arbitrarily
shaped, apodized, and excited ultrasound transducers,” IEEE Trans. Ultrason.
Ferroelectr. Freq. Control, vol. 39, no. 2, pp. 262–267, Mar. 1992.
[102] J. A. Jensen, D. Gandhi, and Jr. W. D. O’Brien, “Ultrasound fields in
an attenuating medium,” in 1993 Proceedings IEEE Ultrasonics Symposium.
IEEE, Oct. 1993, pp. 943–946vol.2.
[103] J. A. Jensen, “Field: A program for simulating ultrasound systems,” in 10TH
NORDICBALTIC CONFERENCE ON BIOMEDICAL IMAGING, VOL. 4,
SUPPLEMENT 1, PART 1: 351–353. Citeseer, 1996.
[104] P. Jarosik and M. Lewandowski, “Automatic Ultrasound Guidance Based
on Deep Reinforcement Learning,” in 2019 IEEE International Ultrasonics
Symposium (IUS). IEEE, Oct. 2019, pp. 475–478.
[105] B. E. Treeby and B. T. Cox, “k-Wave: MATLAB toolbox for the simulation and
reconstruction of photoacoustic wave fields,” in Journal of Biomedical Optics,
Vol. 15, Issue 2. SPIE, Mar. 2010, vol. 15, p. 021314.
[106] R. J. McGough, “Rapid calculations of time-harmonic nearfield pressures
produced by rectangular pistons,” J. Acoust. Soc. Am., vol. 115, no. 5, p.
1934, Apr. 2004.
[107] A. Lasso, T. Heffter, A. Rankin, C. Pinter, T. Ungi, and G. Fichtinger,
REFERENCES 42
“PLUS: Open-Source Toolkit for Ultrasound-Guided Intervention Systems,”

IEEE Trans. Biomed. Eng., vol. 61, no. 10, pp. 2527–2537, May 2014.
[108] R. Kikinis, S. D. Pieper, and K. G. Vosburgh, “3d slicer: a platform for subject-
specific image analysis, visualization, and clinical support,” in Intraoperative
imaging and image-guided therapy. Springer, 2014, pp. 277–289.
[109] D. Garcia, “SIMUS: An open-source simulator for medical ultrasound imaging.
Part I: Theory & examples,” Comput. Methods Programs Biomed., vol. 218, p.
106726, May 2022.
[110] A. Cigier, F. Varray, and D. Garcia, “SIMUS: An open-source simulator
for medical ultrasound imaging. Part II: Comparison with four simulators,”
Comput. Methods Programs Biomed., vol. 220, p. 106774, Jun. 2022.
[111] B. Yang, K. Lu, J. Liu, X. Wang, and C. Gong, “GPU accelerated Monte
Carlo simulation of deep penetration neutron transport,” in 2012 2nd IEEE
International Conference on Parallel, Distributed and Grid Computing. IEEE,
Dec. 2012, pp. 899–904.
[112] M. Unberath, J.-N. Zaech, C. Gao, B. Bier, F. Goldmann, S. C. Lee, J. Fotouhi,
R. Taylor, M. Armand, and N. Navab, “Enabling machine learning in X-ray-
based procedures via realistic simulation of image formation,” Int. J. CARS,
vol. 14, no. 9, pp. 1517–1528, Sep. 2019.
[113] L. Levine and M. Levine, “DRRGenerator: A Three-dimensional Slicer
Extension for the Rapid and Easy Development of Digitally Reconstructed
Radiographs,” J. Clin. Imaging Sci., vol. 10, 2020.
[114] V. Subramanian and K. Ragunath, “Advanced Endoscopic Imaging: A Review
of Commercially Available Technologies,” Clin. Gastroenterol. Hepatol., vol. 12,
no. 3, pp. 368–376.e1, Mar. 2014.
[115] Y. Adachi, N. Shiraishi, A. Shiromizu, T. Bandoh, M. Aramaki, and S. Kitano,
“Laparoscopy-assisted Billroth I gastrectomy compared with conventional open
gastrectomy,” Arch. Surg., vol. 135, no. 7, pp. 806–810, Jul. 2000.
[116] L. J. Wudel, Jr., J. K. Wright, C. W. Pinson, A. Herline, J. Debelak,
S. Seidel, K. Revis, and W. C. Chapman, “Bile duct injury following
laparoscopic cholecystectomy: a cause for continued concern,” Am.
Surg., vol. 67, no. 6, pp. 557–563, Jun. 2001. [Online]. Available:
https://pubmed.ncbi.nlm.nih.gov/11409804
REFERENCES 43
[117] C. Franciosi, R. Caprotti, F. Romano, G. Porta, G. Real, G. Colombo, and

F. Uggeri, “Laparoscopic versus open splenectomy: a comparative study,”
Surg. Laparosc. Endosc. Percutan. Tech., vol. 10, no. 5, pp. 291–295, Oct.
2000. [Online]. Available: https://pubmed.ncbi.nlm.nih.gov/11083211
[118] F. J. Berends, M. A. Cuesta, G. Kazemier, C. H. van Eijck, W. W. de Herder,
J. M. van Muiswinkel, H. A. Bruining, and H. J. Bonjer, “Laparoscopic
detection and resection of insulinomas,” Surgery, vol. 128, no. 3, pp. 386–391,
Sep. 2000.
[119] M. E. Franklin, Jr., J. A. Díaz-E, D. Abrego, E. Parra-Dávila, and J. L. Glass,
“Laparoscopic-assisted colonoscopic polypectomy: the Texas Endosurgery
Institute experience,” Dis. Colon Rectum, vol. 43, no. 9, pp. 1246–1249, Sep.
2000.
[120] K. H. Fuchs, “Minimally Invasive Surgery,” Endoscopy, vol. 34, no. 02, pp.
154–159, Mar. 2002.
[121] P. Natasha, “Endoscopic and minimally-invasive ear surgery: A
path to better outcomes,” World Journal of Otorhinolaryngology-
Head and Neck Surgery, Sep. 2017. [Online]. Available: https:
//mednexus.org/doi/full/10.1016/j.wjorl.2017.08.001
[122] A. Kumar, N. Yadav, S. Singh, and N. Chauhan, “Minimally invasive
(endoscopic-computer assisted) surgery: Technique and review,” Annals of
Maxillofacial Surgery, vol. 6, no. 2, p. 159, Jul. 2016.
[123] D. W. Kennedy and B. A. Senior, “Endoscopic Sinus Surgery: A Review,”
Otolaryngol. Clin. North Am., vol. 30, no. 3, pp. 313–330, Jun. 1997.
[124] J. Zhang, Y. Lyu, Y. Wang, Y. Nie, X. Yang, J. Zhang, and J. Chang,
“Development of laparoscopic cholecystectomy simulator based on unity game
engine,” in CVMP ’18: Proceedings of the 15th ACM SIGGRAPH European
Conference on Visual Media Production. New York, NY, USA: Association
for Computing Machinery, Dec. 2018, pp. 1–9.
[125] V. E. Arriola-Rios, P. Guler, F. Ficuciello, D. Kragic, B. Siciliano, and J. L.
Wyatt, “Modeling of Deformable Objects for Robotic Manipulation: A Tutorial
and Review,” Front. Rob. AI, vol. 7, Sep. 2020.
[126] Z. Li, X. Liu, N. Drenkow, A. Ding, F. X. Creighton, R. H. Taylor, and
M. Unberath, “Revisiting Stereo Depth Estimation From a Sequence-to-
REFERENCES 44
Sequence Perspective with Transformers,” in 2021 IEEE/CVF International

Conference on Computer Vision (ICCV). IEEE, Oct. 2021, pp. 6177–6186.
[127] J. Cartucho, C. Wang, B. Huang, D. S. Elson, A. Darzi, and S. Giannarou,
“An enhanced marker pattern that achieves improved accuracy in surgical tool
tracking,” Computer Methods in Biomechanics and Biomedical Engineering:
Imaging & Visualization, vol. 10, no. 4, pp. 400–408, 2022.
[128] Y. Zhou, R. L. Eimen, E. J. Seibel, and A. K. Bowden, “Cost-efficient video
synthesis and evaluation for development of virtual 3d endoscopy,” IEEE
Journal of Translational Engineering in Health and Medicine, vol. 9, pp. 1–
11, 2021.
[129] E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation
for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
[130] P. Kazanzides, Z. Chen, A. Deguet, G. S. Fischer, R. H. Taylor, and S. P.
DiMaio, “An open-source research kit for the da vinci surgical system,” in
IEEE Intl. Conf. on Robotics and Auto. (ICRA), Hong Kong, China, 2014, pp.
6434–6439.
[131] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang,
and W. Zaremba, “OpenAI Gym,” arXiv, Jun. 2016.
[132] G. M. Fried, L. S. Feldman, M. C. Vassiliou, S. A. Fraser, D. Stanbridge,
G. Ghitulescu, and C. G. Andrew, “Proving the Value of Simulation in
Laparoscopic Surgery,” Ann. Surg., vol. 240, no. 3, p. 518, Sep. 2004.
[133] L. C. Garcia-Peraza-Herrera, L. Fidon, C. D’Ettorre, D. Stoyanov,
T. Vercauteren, and S. Ourselin, “Image Compositing for Segmentation of
Surgical Tools Without Manual Annotations,” IEEE Trans. Med. Imaging,
vol. 40, no. 5, pp. 1450–1460, Feb. 2021.
[134] E. Colleoni, P. Edwards, and D. Stoyanov, “Synthetic and real inputs for tool
segmentation in robotic surgery,” in International Conference on Medical Image
Computing and Computer-Assisted Intervention. Springer, 2020, pp. 700–710.
[135] E. Rohmer, S. P. Singh, and M. Freese, “V-rep: A versatile and scalable
robot simulation framework,” in 2013 IEEE/RSJ International Conference on
Intelligent Robots and Systems. IEEE, 2013, pp. 1321–1326.
[136] J. Y. Wu, P. Kazanzides, and M. Unberath, “Leveraging vision and kinematics
data to improve realism of biomechanic soft tissue simulation for robotic
REFERENCES 45
surgery,” International journal of computer assisted radiology and surgery,

vol. 15, no. 5, pp. 811–818, 2020.
[137] J. Allard, S. Cotin, F. Faure, P.-J. Bensoussan, F. Poyer, C. Duriez,
H. Delingette, and L. Grisoni, “Sofa-an open source framework for medical
simulation,” in MMVR 15-Medicine Meets Virtual Reality, vol. 125. IOP
Press, 2007, pp. 13–18.
[138] J. Y. Wu, A. Munawar, M. Unberath, and P. Kazanzides, “Learning soft-tissue
simulation from models and observation,” in 2021 International Symposium on
Medical Robotics (ISMR). IEEE, 2021, pp. 1–7.
[139] A. Zia, K. Bhattacharyya, X. Liu, Z. Wang, S. Kondo, E. Colleoni, B. van
Amsterdam, R. Hussain, R. Hussain, L. Maier-Hein, D. Stoyanov, S. Speidel,
and A. Jarc, “Surgical Visual Domain Adaptation: Results from the MICCAI
2020 SurgVisDom Challenge,” arXiv, Feb. 2021.
[140] N. Madapana, M. M. Rahman, N. Sanchez-Tamayo, M. V. Balakuntala,
G. Gonzalez, J. P. Bindu, L. N. V. Venkatesh, X. Zhang, J. B. Noguera, T. Low,
R. M. Voyles, Y. Xue, and J. Wachs, “DESK: A Robotic Activity Dataset for
Dexterous Surgical Skills Transfer to Medical Robots,” in 2019 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS). IEEE,
Nov. 2019, pp. 6928–6934.
[141] M. M. Rahman, R. M. Voyles, J. Wachs, and Y. Xue, “Sequential prediction
with logic constraints for surgical robotic activity recognition,” in 2021 30th
IEEE International Conference on Robot & Human Interactive Communication
(RO-MAN). IEEE, 2021, pp. 468–475.
[142] F. M. Abu-Zidan, A. F. Hefny, and P. Corr, “Clinical ultrasound physics,”
Journal of Emergencies, Trauma and Shock, vol. 4, no. 4, p. 501, Oct. 2011.
[143] P. N. T. Wells, “Ultrasound imaging,” Phys. Med. Biol., vol. 51, no. 13, p.
R83, Jun. 2006.
[144] M. S. Newell and M. C. Mahoney, “Ultrasound-Guided Percutaneous Breast
Biopsy,” Tech. Vasc. Interv. Radiol., vol. 17, no. 1, pp. 23–31, Mar. 2014.
[145] J. C. Presti, “Biopsy strategies—how many and where?” in Prostate Biopsy.
Springer, 2008, pp. 165–178.
[146] A. Patriti, G. Ceccarelli, A. Bartoli, A. Spaziani, L. M. Lapalorcia, and
L. Casciola, “Laparoscopic and robot-assisted one-stage resection of colorectal
REFERENCES 46
cancer with synchronous liver metastases: a pilot study,” J. Hepatobiliary.

Pancreat. Surg., vol. 16, no. 4, pp. 450–457, Jul. 2009.
[147] M. L. Calin, A. Sadiq, G. Arevalo, R. Fuentes, V. L. Flanders, N. Gupta,
B. Nasri, and K. Singh, “The First Case Report of Robotic Multivisceral
Resection for Synchronous Liver Metastasis from Pancreatic Neuroendocrine
Tumor: A Case Report and Literature Review,” Journal of Laparoendoscopic
& Advanced Surgical Techniques, vol. 26, no. 10, pp. 816–824, Oct. 2016.
[148] T. M. Walsh, M. A. Borahay, K. A. Fox, and G. S. Kilic, “Robotic-Assisted,
Ultrasound-Guided Abdominal Cerclage During Pregnancy: Overcoming
Minimally Invasive Surgery Limitations?” J. Minim. Invasive Gynecol., vol. 20,
no. 3, pp. 398–400, May 2013.
[149] O. Mohareri, J. Ischia, P. C. Black, C. Schneider, J. Lobo,
L. Goldenberg, and S. E. Salcudean, “Intraoperative Registered
Transrectal Ultrasound Guidance for Robot-Assisted Laparoscopic
Radical Prostatectomy,” J. Urol., Jan. 2015. [Online]. Available:
https://www.auajournals.org/doi/10.1016/j.juro.2014.05.124
[150] M. Antico, F. Sasazawa, L. Wu, A. Jaiprakash, J. Roberts, R. Crawford,
A. K. Pandey, and D. Fontanarosa, “Ultrasound guidance in minimally invasive
robotic procedures,” Med. Image Anal., vol. 54, pp. 149–167, May 2019.
[151] Y. Suematsu, B. Kiaii, D. T. Bainbridge, P. J. del Nido, and R. J.
Novick, “Robotic-assisted closure of atrial septal defect under real-time three-
dimensional echo guide: in vitro study,” Eur. J. Cardiothorac. Surg., vol. 32,
no. 4, pp. 573–576, Oct. 2007.
[152] S. A. Goss, L. A. Frizzell, and F. Dunn, “Ultrasonic absorption and attenuation
in mammalian tissues,” Ultrasound Med. Biol., vol. 5, no. 2, pp. 181–186, Jan.
1979.
[153] J.-L. Dillenseger, S. Laguitton, and É. Delabrousse, “Fast simulation of
ultrasound images from a CT volume,” Comput. Biol. Med., vol. 39, no. 2,
pp. 180–186, Feb. 2009.
[154] R. M. Cramblitt and K. J. Parker, “Generation of non-rayleigh speckle
distributions using marked regularity models,” IEEE transactions on
ultrasonics, ferroelectrics, and frequency control, vol. 46, no. 4, pp. 867–874,
1999.
REFERENCES 47
[155] S. Nadarajah, “Statistical distributions of potential interest in ultrasound

speckle analysis,” Phys. Med. Biol., vol. 52, no. 10, p. N213, Apr. 2007.
[156] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen,
Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang,
Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang,
J. Bai, and S. Chintala, “PyTorch: An Imperative Style, High-Performance
Deep Learning Library,” Advances in Neural Information Processing Systems,
vol. 32, 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/
hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
[157] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp,
G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg,
D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens,
B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan,
F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu,
and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous
systems,” 2015, software available from tensorflow.org. [Online]. Available:
http://tensorflow.org/
[158] G. E. Tupholme, “Generation of acoustic pulses by baffled plane pistons,”
Mathematika, vol. 16, no. 2, pp. 209–224, 1969.
[159] P. R. Stepanishen, “The time-dependent force and radiation impedance on a
piston in a rigid infinite planar baffle,” The Journal of the Acoustical Society
of America, vol. 49, no. 3B, pp. 841–849, 1971.
[160] ——, “Transient radiation from pistons in an infinite planar baffle,” The
Journal of the Acoustical Society of America, vol. 49, no. 5B, pp. 1629–1638,
1971.
[161] J. A. Jensen and P. Munk, “Computer Phantoms for Simulating Ultrasound B-
Mode and CFM Images,” in Acoustical Imaging. Boston, MA, USA: Springer,
Boston, MA, 1997, pp. 75–80.
[162] J. A. Jensen, “A multi-threaded version of Field II,” in 2014 IEEE International
Ultrasonics Symposium. IEEE, Sep. 2014, pp. 2229–2232.
[163] D. Hyun, L. L. Brickson, K. T. Looby, and J. J. Dahl, “Beamforming and
Speckle Reduction Using Neural Networks,” IEEE Trans. Ultrason. Ferroelectr.
Freq. Control, vol. 66, no. 5, pp. 898–910, Mar. 2019.
REFERENCES 48
[164] A. A. Nair, K. N. Washington, T. D. Tran, A. Reiter, and M. A. L. Bell,

“Deep Learning to Obtain Simultaneous Image and Segmentation Outputs
From a Single Input of Raw Ultrasound Channel Data,” IEEE Trans. Ultrason.
Ferroelectr. Freq. Control, vol. 67, no. 12, pp. 2493–2509, May 2020.
[165] M. Amiri, R. Brooks, and H. Rivaz, “Fine-Tuning U-Net for Ultrasound Image
Segmentation: Different Layers, Different Outcomes,” IEEE Trans. Ultrason.
Ferroelectr. Freq. Control, vol. 67, no. 12, pp. 2510–2518, Aug. 2020.
[166] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for
Biomedical Image Segmentation,” in Medical Image Computing and Computer-
Assisted Intervention – MICCAI 2015. Cham, Switzerland: Springer, Nov.
2015, pp. 234–241.
[167] Y. Zhao, Y. Lu, X. Lu, J. Jin, L. Tao, and X. Chen, “Biopsy Needle
Segmentation using Deep Networks on inhomogeneous Ultrasound Images,”
in 2022 44th Annual International Conference of the IEEE Engineering in
Medicine & Biology Society (EMBC). IEEE, Jul. 2022, pp. 553–556.
[168] B. Jiang, K. Xu, R. H. Taylor, E. Graham, M. Unberath, and E. M.
Boctor, “Standard Plane Extraction From 3D Ultrasound With 6-DOF Deep
Reinforcement Learning Agent,” in 2020 IEEE International Ultrasonics
Symposium (IUS). IEEE, Sep. 2020, pp. 1–4.
[169] B. Peng, X. Huang, S. Wang, and J. Jiang, “A Real-Time Medical Ultrasound
Simulator Based on a Generative Adversarial Network Model,” in 2019 IEEE
International Conference on Image Processing (ICIP). IEEE, Sep. 2019, pp.
4629–4633.
[170] B. E. Treeby, J. Jaros, A. P. Rendell, and B. T. Cox, “Modeling nonlinear
ultrasound propagation in heterogeneous media with power law absorption
using a k-space pseudospectral method,” J. Acoust. Soc. Am., vol. 131, no. 6,
p. 4324, Jun. 2012.
[171] D. Allman, A. Reiter, and M. A. L. Bell, “Photoacoustic Source Detection and
Reflection Artifact Removal Enabled by Deep Learning,” IEEE Trans. Med.
Imaging, vol. 37, no. 6, pp. 1464–1477, Apr. 2018.
[172] K. Johnstonbaugh, S. Agrawal, D. Abhishek, M. Homewood, S. P. K. Karri,
and S.-R. Kothapalli, “Novel deep learning architecture for optical fluence
dependent photoacoustic target localization,” in Proceedings Volume 10878,
REFERENCES 49
Photons Plus Ultrasound: Imaging and Sensing 2019. SPIE, Feb. 2019, vol.
10878, pp. 95–102.
[173] M. Xu and L. V. Wang, “Photoacoustic imaging in biomedicine,” Rev. Sci.
Instrum., vol. 77, no. 4, p. 041101, Apr. 2006.
[174] M. Shi, Z. Wang, T. Zhao, S. J. West, A. E. Desjardins, T. Vercauteren, and
W. Xia, “Enhancing Photoacoustic Visualisation of Clinical Needles with Deep
Learning,” in 2021 IEEE International Ultrasonics Symposium (IUS). IEEE,
Sep. 2021, pp. 1–4.
[175] M. Shi, T. Zhao, S. J. West, A. E. Desjardins, T. Vercauteren, and W. Xia,
“Improving needle visibility in LED-based photoacoustic imaging using deep
learning with semi-synthetic datasets,” Photoacoustics, vol. 26, p. 100351, Jun.
2022.
[176] D. Chen and R. J. McGough, “A 2d fast near-field method for calculating
near-field pressures generated by apodized rectangular pistons,” Journal of the
Acoustical Society of America, vol. 124, no. 5, pp. 1526–1537, 2008.
[177] J. F. Kelly and R. J. McGough, “A time-space decomposition method for
calculating the nearfield pressure generated by a pulsed circular piston,” IEEE
Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 53,
no. 6, pp. 1150–1159, 2006.
[178] Y. Zhu, T. L. Szabo, and R. J. McGough, “A comparison of ultrasound image
simulations with FOCUS and field II,” in 2012 IEEE International Ultrasonics
Symposium. IEEE, Oct. 2012, pp. 1694–1697.
[179] A. Arjas, E. J. Alles, E. Maneas, S. Arridge, A. Desjardins, M. J. Sillanpää, and
A. Hauptmann, “Neural Network Kalman Filtering for 3-D Object Tracking
From Linear Array Ultrasound Data,” IEEE Trans. Ultrason. Ferroelectr. Freq.
Control, vol. 69, no. 5, pp. 1691–1702, Mar. 2022.
[180] L. Bartha, A. Lasso, C. Pinter, T. Ungi, Z. Keri, and G. Fichtinger, “Open-
source surface mesh-based ultrasound-guided spinal intervention simulator,”
Int. J. CARS, vol. 8, no. 6, pp. 1043–1051, Nov. 2013.
[181] H. Patel and I. Hacihaliloglu, “Improved Automatic Bone Segmentation Using
Large-Scale Simulated Ultrasound Data to Segment Real Ultrasound Bone
Surface Data,” in 2020 IEEE 20th International Conference on Bioinformatics
and Bioengineering (BIBE). IEEE, Oct. 2020, pp. 288–294.
REFERENCES 50
[182] M. Salehi, S.-A. Ahmadi, R. Prevost, N. Navab, and W. Wein, “Patient-

specific 3D Ultrasound Simulation Based on Convolutional Ray-tracing and
Appearance Optimization,” in Medical Image Computing and Computer-
Assisted Intervention – MICCAI 2015. Cham, Switzerland: Springer, Nov.
2015, pp. 510–518.
[183] M. Sharifzadeh, H. Benali, and H. Rivaz, “An Ultra-Fast Method for
Simulation of Realistic Ultrasound Images,” in 2021 IEEE International
Ultrasonics Symposium (IUS). IEEE, Sep. 2021, pp. 1–4.
[184] M. Frigo and S. G. Johnson, “FFTW: an adaptive software architecture for the
FFT,” in Proceedings of the 1998 IEEE International Conference on Acoustics,
Speech and Signal Processing, ICASSP ’98 (Cat. No.98CH36181). IEEE, May
1998, vol. 3, pp. 1381–1384vol.3.
[185] D. Garcia, “Make the most of MUST, an open-source Matlab UltraSound
Toolbox,” in 2021 IEEE International Ultrasonics Symposium (IUS). IEEE,
Sep. 2021, pp. 1–4.
[186] G. Ning, X. Zhang, and H. Liao, “Autonomic Robotic Ultrasound Imaging
System Based on Reinforcement Learning,” IEEE Trans. Biomed. Eng., vol. 68,
no. 9, pp. 2787–2797, Jan. 2021.
[187] J. T. Bushberg and J. M. Boone, The essential physics of medical imaging.
Lippincott Williams & Wilkins, 2011.
[188] P. Suetens, Fundamentals of medical imaging. Cambridge university press,
2017.
[189] A. Starr, A. Jones, C. Reinert, and D. Borer, “Preliminary results and
complications following limited open reduction and percutaneous screw fixation
of displaced fractures of the actabulum,” Injury, vol. 32, pp. 45–50, 2001.
[190] M. L. Chip Routt, Jr. and P. T. Simonian, “Closed Reduction and
Percutaneous Skeletal Fixation of Sacral Fractures.” Clinical Orthopaedics
and Related Research (1976-2007), vol. 329, p. 121, Aug. 1996.
[Online]. Available: https://journals.lww.com/corr/Abstract/1996/08000/
Closed_Reduction_and_Percutaneous_Skeletal.15.aspx
[191] N. Sugano, “Computer-assisted orthopedic surgery,” J. Orthop. Sci., vol. 8,
no. 3, pp. 442–448, May 2003.
REFERENCES 51
[192] J. Masonis, C. Thompson, and S. Odum, “Safe and accurate: learning the
direct anterior total hip arthroplasty.” Orthopedics, vol. 31, no. 12 Suppl
2, p. orthosupersite.com/view.asp?rID=37187, Dec. 2008. [Online]. Available:
https://europepmc.org/article/med/19298019
[193] M. L. C. Routt, Jr., S. E. Nork, and W. J. Mills, “Per-
cutaneous Fixation of Pelvic Ring Disruptions,” Clinical Or-
thopaedics and Related Research®, vol. 375, pp. 15–29, Jun.
2000. [Online]. Available: https://journals.lww.com/clinorthop/fulltext/
2000/06000/percutaneous_fixation_of_pelvic_ring_disruptions.4.aspx?
casa_token=dLjKpjc-lIoAAAAA:HRooQQ0Q8Dk8wyBsBOeawyrpNZ_
LkQcmIOxqTftQUpD2csLkGHZx0LxXeSP4QYs-2xC9cPe-7Hdxwu1H8273CMZq
[194] Y. Otake, M. Armand, R. S. Armiger, M. D. Kutzer, E. Basafa, P. Kazanzides,
and R. H. Taylor, “Intraoperative Image-based Multiview 2D/3D Registration
for Image-Guided Orthopaedic Surgery: Incorporation of Fiducial-Based C-
Arm Tracking and GPU-Acceleration,” IEEE Trans. Med. Imaging, vol. 31,
no. 4, pp. 948–962, Nov. 2011.
[195] R. Harstall, P. F. Heini, R. L. Mini, and R. Orler, “Radiation Exposure to
the Surgeon During Fluoroscopically Assisted Percutaneous Vertebroplasty: A
Prospective Study,” Spine, vol. 30, no. 16, pp. 1893–1898, Aug. 2005.
[196] R. Kloeckner, A. Bersch, D. P. dos Santos, J. Schneider, C. Düber, and
M. B. Pitton, “Radiation Exposure in Nonvascular Fluoroscopy-Guided
Interventional Procedures,” Cardiovasc. Intervent. Radiol., vol. 35, no. 3, pp.
613–620, Jun. 2012.
[197] D. L. Miller, E. Vañó, G. Bartal, S. Balter, R. Dixon, R. Padovani, B. Schueler,
J. F. Cardella, and T. de Baère, “Occupational Radiation Protection in
Interventional Radiology: A Joint Guideline of the Cardiovascular and
Interventional Radiology Society of Europe and the Society of Interventional
Radiology,” Cardiovasc. Intervent. Radiol., vol. 33, no. 2, pp. 230–239, Apr.
2010.
[198] M. Zellerhoff, Y. Deuerling-Zheng, C. M. Strother, A. Ahmed, K. Pulfer,
T. Redel, K. Royalty, J. Grinde, and D. Consigny, “Measurement of cerebral
blood volume using angiographic C-arm systems,” in Proceedings Volume 7262,
Medical Imaging 2009: Biomedical Applications in Molecular, Structural, and
Functional Imaging. SPIE, Feb. 2009, vol. 7262, pp. 121–128.
REFERENCES 52
[199] B. Maurel, J. Sobocinski, P. Perini, M. Guillou, M. Midulla, R. Azzaoui, and

S. Haulon, “Evaluation of Radiation during EVAR Performed on a Mobile
C-arm,” Eur. J. Vasc. Endovasc. Surg., vol. 43, no. 1, pp. 16–21, Jan. 2012.
[200] R. Fossaceca, M. Brambilla, G. Guzzardi, P. Cerini, A. Renghi, S. Valzano,
P. Brustia, and A. Carriero, “The impact of radiological equipment on patient
radiation exposure during endovascular aortic aneurysm repair,” Eur. Radiol.,
vol. 22, no. 11, pp. 2424–2431, Nov. 2012.
[201] J. Van Houtte, E. Audenaert, G. Zheng, and J. Sijbers, “Deep learning-based
2D/3D registration of an atlas to biplanar X-ray images,” International Journal
of Computer Assisted Radiology and Surgery, vol. 17, no. 7, pp. 1333–1342, Jul.
2022.
[202] D. J. Kaplan, J. N. Patel, F. A. Liporace, and R. S. Yoon, “Intraoperative
radiation safety in orthopaedics: a review of the ALARA (As low as reasonably
achievable) principle,” Patient Saf. Surg., vol. 10, no. 1, pp. 1–7, Dec. 2016.
[203] M. Unberath, A. Maier, D. Fleischmann, J. Hornegger, and R. Fahrig, “Open-
source 4D statistical shape model of the heart for x-ray projection imaging,”
in 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI).
IEEE, Apr. 2015, pp. 739–742.
[204] M. Unberath, J. Hajek, T. Geimer, F. Schebesch, M. Amrehn, and A. Maier,
“Deep learning-bases inpainting for virtual dsa,” in IEEE Nuclear science
symposium and medical imaging conference, 2017, pp. 1–2.
[205] B. Bier, K. Aschoff, C. Syben, M. Unberath, M. Levenston, G. Gold, R. Fahrig,
and A. Maier, “Detecting Anatomical Landmarks for Motion Estimation in
Weight-Bearing Imaging of Knees,” in Machine Learning for Medical Image
Reconstruction. Cham, Switzerland: Springer, Sep. 2018, pp. 83–90.
[206] M. Unberath, A. Aichert, S. Achenbach, and A. Maier, “Virtual single-frame
subtraction imaging,” in Proc. Int. Conf. Image Form X-Ray CT, 2016, pp.
89–92.
[207] A. Badal and A. Badano, “Accelerating Monte Carlo simulations of photon
transport in a voxelized geometry using a massively parallel graphics processing
unit,” Med. Phys., vol. 36, no. 11, pp. 4878–4880, Nov. 2009.
[208] F. Milletari, N. Navab, and S.-A. Ahmadi, “V-Net: Fully Convolutional
REFERENCES 53
Neural Networks for Volumetric Medical Image Segmentation,” in 2016 Fourth

International Conference on 3D Vision (3DV). IEEE, Oct. 2016, pp. 565–571.
[209] B. Bier, M. Unberath, J.-N. Zaech, J. Fotouhi, M. Armand, G. Osgood,
N. Navab, and A. Maier, “X-ray-transform Invariant Anatomical Landmark
Detection for Pelvic Trauma Surgery,” in Medical Image Computing and
Computer Assisted Intervention – MICCAI 2018. Cham, Switzerland:
Springer, Sep. 2018, pp. 55–63.
[210] M. Grimm, J. Esteban, M. Unberath, and N. Navab, “Pose-Dependent Weights
and Domain Randomization for Fully Automatic X-Ray to CT Registration,”
IEEE Trans. Med. Imaging, vol. 40, no. 9, pp. 2221–2232, Apr. 2021.
[211] W. Gu, C. Gao, R. Grupp, J. Fotouhi, and M. Unberath, “Extended
Capture Range of Rigid 2D/3D Registration by Estimating Riemannian Pose
Gradients,” in Machine Learning in Medical Imaging. Cham, Switzerland:
Springer, Sep. 2020, pp. 281–291.
[212] K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore,
S. Phillips, D. Maffitt, M. Pringle et al., “The cancer imaging archive (tcia):
maintaining and operating a public information repository,” Journal of digital
imaging, vol. 26, no. 6, pp. 1045–1057, 2013.
[213] C. Chênes and J. Schmid, “Revisiting Contour-Driven and Knowledge-Based
Deformable Models: Application to 2D-3D Proximal Femur Reconstruction
from X-ray Images,” in Medical Image Computing and Computer Assisted
Intervention – MICCAI 2021. Cham, Switzerland: Springer, Sep. 2021, pp.
451–460.
[214] C. Gao, H. Phalen, S. Sefati, J. Ma, R. H. Taylor, M. Unberath, and
M. Armand, “Fluoroscopic Navigation for a Surgical Robotic System Including
a Continuum Manipulator,” IEEE Trans. Biomed. Eng., vol. 69, no. 1, pp. 453–
464, Jul. 2021.
[215] H. Esfandiari, S. Weidert, I. Kövesházi, C. Anglin, J. Street, and A. J.
Hodgson, “Deep learning-based X-ray inpainting for improving spinal 2D-3D
registration,” Int. J. Med. Rob. Comput. Assisted Surg., vol. 17, no. 2, p. e2228,
Apr. 2021.
[216] D. Toth, S. Cimen, P. Ceccaldi, T. Kurzendorfer, K. Rhode, and P. Mountney,
“Training Deep Networks on Domain Randomized Synthetic X-ray Data
REFERENCES 54
for Cardiac Interventions,” in International Conference on Medical Imaging

with Deep Learning. PMLR, May 2019, pp. 468–482. [Online]. Available:
http://proceedings.mlr.press/v102/toth19a.html
[217] C. Gao, B. D. Killeen, Y. Hu, R. B. Grupp, R. H. Taylor, M. Armand, and
M. Unberath, “SyntheX: Scaling Up Learning-based X-ray Image Analysis
Through In Silico Experiments,” arXiv, Jun. 2022.
[218] L. Kausch, S. Thomas, H. Kunze, T. Norajitra, A. Klein, J. S. El Barbari,
M. Privalov, S. Vetter, A. Mahnken, L. Maier-Hein, and K. H. Maier-Hein, “C-
Arm Positioning for Spinal Standard Projections in Different Intra-operative
Settings,” in Medical Image Computing and Computer Assisted Intervention –
MICCAI 2021. Cham, Switzerland: Springer, Sep. 2021, pp. 352–362.
[219] H. Esfandiari, S. Andreß, M. Herold, W. Böcker, S. Weidert, and A. J.
Hodgson, “A deep learning approach for single shot c-arm pose estimation,”
CAOS, vol. 4, pp. 69–73, 2020.
[220] B. D. Killeen, S. Chakraborty, G. Osgood, and M. Unberath, “Toward
perception-based anticipation of cortical breach during K-wire fixation of the
pelvis,” in Proceedings Volume 12031, Medical Imaging 2022: Physics of
Medical Imaging. SPIE, Apr. 2022, vol. 12031, pp. 410–415.
[221] R. Sukesh, A. Fieselmann, S. Jaganathan, K. Shetty, R. Kärgel, F. Kordon,
S. Kappler, and A. Maier, “Training Deep Learning Models for 2D Spine X-
rays Using Synthetic Images and Annotations Created from 3D CT Volumes,”
in Bildverarbeitung für die Medizin 2022. Wiesbaden, Germany: Springer
Vieweg, Apr. 2022, pp. 63–68.
[222] J. Dhont, D. Verellen, I. Mollaert, V. Vanreusel, and J. Vandemeulebroucke,
“RealDRR – Rendering of realistic digitally reconstructed radiographs using
locally trained image-to-image translation,” Radiother. Oncol., vol. 153, pp.
213–219, Dec. 2020.
[223] C. Zhou, T. Cha, Y. Peng, and G. Li, “Transfer learning from an artificial
radiograph-landmark dataset for registration of the anatomic skull model to
dual fluoroscopic X-ray images,” Comput. Biol. Med., vol. 138, p. 104923, Nov.
2021.
[224] D. Kügler, J. Sehring, A. Stefanov, I. Stenin, J. Kristin, T. Klenzner,
J. Schipper, and A. Mukhopadhyay, “i3PosNet: instrument pose estimation
REFERENCES 55
from X-ray in temporal bone surgery,” Int. J. CARS, vol. 15, no. 7, pp. 1137–
1145, Jul. 2020.
[225] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and
R. Ng, “NeRF: representing scenes as neural radiance fields for view synthesis,”
Commun. ACM, vol. 65, no. 1, pp. 99–106, Dec. 2021.
[226] P. N. Huy and T. M. Quan, “Neural Radiance Projection,” in 2022 IEEE 19th
International Symposium on Biomedical Imaging (ISBI). IEEE, Mar. 2022,
pp. 1–5.
[227] T. M. Quan, H. M. Thanh, T. D. Huy, N. Do Trung Chanh, N. T. P. Anh, P. H.
Vu, N. H. Nam, T. Q. Tuong, V. M. Dien, B. Van Giang, B. H. Trung, and
S. Q. H. Truong, “XPGAN: X-Ray Projected Generative Adversarial Network
For Improving Covid-19 Image Classification,” in 2021 IEEE 18th International
Symposium on Biomedical Imaging (ISBI). IEEE, Apr. 2021, pp. 1509–1513.
[228] E. Abadi, B. Harrawood, S. Sharma, A. Kapadia, W. P. Segars, and E. Samei,
“DukeSim: A Realistic, Rapid, and Scanner-Specific Simulation Framework
in Computed Tomography,” IEEE Trans. Med. Imaging, vol. 38, no. 6, pp.
1457–1465, Jun. 2019.
[229] M. Zarei, E. Abadi, R. Fricks, W. P. Segars, and E. Samei, “A probabilistic
conditional adversarial neural network to reduce imaging variation in
radiography,” in Proceedings Volume 11595, Medical Imaging 2021: Physics
of Medical Imaging. SPIE, Feb. 2021, vol. 11595, pp. 1026–1033.
[230] Y. Lee, X. Tan, C.-B. Chng, and C.-K. Chui, “Simulation of Robot-Assisted
Flexible Needle Insertion Using Deep Q-Network,” in 2019 IEEE International
Conference on Systems, Man and Cybernetics (SMC). IEEE, Oct. 2019, pp.
342–346.
[231] J.-N. Zaech, C. Gao, B. Bier, R. Taylor, A. Maier, N. Navab, and M. Unberath,
“Learning to Avoid Poor Images: Towards Task-aware C-arm Cone-beam CT
Trajectories,” in Medical Image Computing and Computer Assisted Intervention
– MICCAI 2019. Cham, Switzerland: Springer, Oct. 2019, pp. 11–19.
[232] M. Thies, J.-N. Zäch, C. Gao, R. Taylor, N. Navab, A. Maier, and M. Unberath,
“A learning-based method for online adjustment of C-arm Cone-beam CT
source trajectories for artifact avoidance,” Int. J. CARS, vol. 15, no. 11, pp.
1787–1796, Nov. 2020.
REFERENCES 56
[233] A. T. Årsvold, A. S. Zeltner, Z. Cheng, K. L. Schwaner, P. T. Jensen, and

T. R. Savarimuthu, “Lymph Node Detection Using Robot Assisted Electrical
Impedance Scanning and an Artificial Neural Network,” in 2021 International
Symposium on Medical Robotics (ISMR). IEEE, Nov. 2021, pp. 1–6.
[234] J. R. Zech, M. A. Badgeley, M. Liu, A. B. Costa, J. J. Titano, and E. K.
Oermann, “Variable generalization performance of a deep learning model to
detect pneumonia in chest radiographs: A cross-sectional study,” PLoS Med.,
vol. 15, no. 11, p. e1002683, Nov. 2018.
[235] V. Sorin, Y. Barash, E. Konen, and E. Klang, “Creating Artificial Images for
Radiology Applications Using Generative Adversarial Networks (GANs) – A
Systematic Review,” Acad. Radiol., vol. 27, no. 8, pp. 1175–1185, Aug. 2020.
View publication stats

2022 in Silico Review

Uploaded by

Copyright:

Available Formats

You might also like

2022 in Silico Review

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 in Silico Review

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

In Silico simulation: A key enabling technology for next-generation intelligent

Article in Progress in Biomedical Engineering · May 2023

Russell Taylor Mathias Unberath

SEE PROFILE SEE PROFILE

Robot-Assisted treatment of osteolysis using a continuum manipulator View project

Robots with Complementary Situational Awareness View project

The user has requested enhancement of the downloaded file.

E-mail: killeen@jhu.edu, scho72@jhu.edu, marmand2@jhmi.edu,

Abstract. To mitigate the challenges of operating through narrow incisions

Keywords: in silico virtual clinical trials, minimally invasive surgery, machine

Minimally invasive surgery (MIS) represents a notable shift in surgical methodology,

2. Simulation Methods for MIS

The required capabilities of a simulation framework for in silico training depend on

Camera center Camera center/source

(a) Volumetric Simulation (b) Sparse Simulation

2.1. Instrument and Patient Models

voxelization. Since instruments generally consist of homogeneous components with

2.2. Physics Engine

Modality Framework Platforms Open Description

3. Visible Light Imaging

Endoscopic or laparoscopic imaging is carried out via an endoscope, a flexible

3.1. Asynchronous Multi-Body Framework (AMBF)

as is commonly caused by energized surgical instruments. In this case, in silico

3.3. Alternative Frameworks

In ultrasound (US) imaging, a transducer transmits compression and rarefactions

of resolution and detail.

RLUS Obtaining optimal positioning and imaging settings can be a challenging

RL agents, which typically train simultaneous with environment simulation.

4.4. PLUS Toolkit

4.5. Alternative Frameworks

X-ray imaging leverages high-energy photons to penetrate tissue, measuring the

of images in a matter of hours [44]. Separately, the popularity of Python in

minimally invasive orthopedic procedures. In order to anticipate complications

5.4. Alternative Frameworks

6. Additional Imaging Modalities

7. Discussion and Conclusion

volumetric and sparse virtual object models is computationally prohibitive, advances

[1] T. Haidegger, S. Speidel, D. Stoyanov, and R. M. Satava, “Robot-assisted

[12] D. Burschka, J. J. Corso, M. Dewan, W. Lau, M. Li, H. Lin, P. Marayong,

for synergistic surgical training and data generation,” Computer Methods in

Segmentation in Robotic Surgery,” in Medical Image Computing and Computer

International Conference on Intelligent Robots and Systems (IROS). IEEE,

of Augmented Reality in Robotic-Assisted Surgery,” IEEE Transactions on

[64] J. Kim, M. de Mathelin, K. Ikuta, and D.-S. Kwon, “Advancement of Flexible

[73] S. M. Hussain, A. Brunetti, G. Lucarelli, R. Memeo, V. Bevilacqua, and

Conference on Bioinformatics and Bioengineering (BIBE). IEEE, Oct. 2021,

Simulation from Models and Observation,” in 2021 International Symposium

“PLUS: Open-Source Toolkit for Ultrasound-Guided Intervention Systems,”

[117] C. Franciosi, R. Caprotti, F. Romano, G. Porta, G. Real, G. Colombo, and

Sequence Perspective with Transformers,” in 2021 IEEE/CVF International

surgery,” International journal of computer assisted radiology and surgery,

cancer with synchronous liver metastases: a pilot study,” J. Hepatobiliary.

[155] S. Nadarajah, “Statistical distributions of potential interest in ultrasound

[164] A. A. Nair, K. N. Washington, T. D. Tran, A. Reiter, and M. A. L. Bell,

[182] M. Salehi, S.-A. Ahmadi, R. Prevost, N. Navab, and W. Wein, “Patient-

[199] B. Maurel, J. Sobocinski, P. Perini, M. Guillou, M. Midulla, R. Azzaoui, and

Neural Networks for Volumetric Medical Image Segmentation,” in 2016 Fourth